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METHOD FOR IDENTIFYING GENES ENCODING NOVEL 
SECRETED OR MEMBRANE -ASSOCIATED PROTEINS 

Background of the Invention 
5 The invention relates to methods for identifying 

genes encoding novel proteins. 

There is considerable medical interest in secreted 
and membrane-associated mammalian proteins. Many such 
proteins, for example, cytokines, are important for 

10 inducing the growth or differentiation of cells with 
which they interact or for triggering one or more 
specific cellular responses. 

An important goal in the design and development of 
new therapies is the identification and characterization 

15 of secreted proteins and the genes which encode them. 

Traditionally, this goal has been pursued by identifying 
a particular response of a particular cell type and 
attempting to isolate and purify a secreted protein 
capable of eliciting the response. This approach is 

20 limited by a number of factors. First, certain secreted 
proteins will not be identified because the responses 
they evoke may not be recognizable or measurable. 
Second, because in vitro assays must be used to isolate 
and purify secreted proteins, somewhat artificial systems 

2 5 must be used. This raises the possibility that certain 
important secreted proteins will not be identified unless 
the features of the in vitro system (e.g., cell line, 
culture medium, or growth conditions) accurately reflect 
the in vivo milieu. Third, the complexity of the effects 

30 of secreted proteins on the cells with which they 
interact vastly complicates the task of isolating 
important secreted proteins. Any given cell can be 
simultaneously subject to the effects of two or more 
secreted proteins. Because any two secreted proteins 
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will not have the same effect on a given cell and because 
the effect of a first secreted protein on a given cell 
can alter the effect of a second secreted protein on the 
same cell, it can be difficult to isolate the secreted 
5 protein or proteins responsible for a given physiological 
response. In addition, certain secreted and membrane- 
associated proteins may be expressed at levels that are 
too low to detect by biological assay or protein 
pur if icat ion . 

10 In another approach, genes encoding secreted 

proteins have been isolated using DNA probes or PCR 
oligonucleotides which recognize sequence motifs present 
in genes encoding known secreted protein. In addition, 
homology-directed searching of Expressed Sequence Tag 

15 (EST) sequences derived by high- throughput sequencing of 
specific cDNA libraries has been used to identify genes 
encoding secreted proteins. These approaches depend for 
their success on a high degree of similarity between the 
DNA sequences used as probes and the unknown genes or EST 

2 0 sequences . 

More recently, methods have been developed that 
permit the identification of cDNAs encoding a signal 
sequence capable of directing the secretion of a 
particular protein from certain cell types. Both Hon jo, 
25 U.S. Patent No. 5,525,486, and Jacobs, U.S. Patent No. 

5,536,637, describe such methods. These methods are said 
to be capable of identifying secreted proteins. 

The demonstrated clinical utility of several 
secreted proteins in the treatment of human disease, for 

3 0 example, erythropoietin, granulocyte-macrophage colony 

stimulating factor (GM-CSF) , human growth hormone, and 
various interleukins, has generated considerable interest 
in the identification of novel secreted proteins. The 
method of the invention can be employed as a tool in the 
3 5 discovery of such novel proteins. 
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Summary of the Invention 
The invention features a method for isolating 
cDNAs and identifying encode secreted or membrane- 
associated (e.g. transmembrane) mammalian proteins. The 
5 method of the invention relies upon the observation that 
the majority of secreted and membrane-associated proteins 
possess at their amino termini a stretch of hydrophobic 
amino acid residues referred to as the "signal sequence." 
The signal sequence directs secreted and membrane- 

10 associated proteins to a sub-cellular membrane 

compartment termed the endoplasmic reticulum, from which 
these proteins are dispatched for secretion or 
presentation on the cell surface. 

The invention describes a method in which oDNAs 

15 that encode signal sequences for secreted or membrane- 
associated proteins are isolated by virtue of their 
abilities to direct the export of the reporter protein , 
alkaline phosphatase (AP) , from mammalian cells. The 
present method has major advantages over other signal 

20 peptide trapping approaches. The present method is 
highly sensitive. This facilitates the isolation of 
signal peptide associated proteins that may be difficult 
to isolate with other techniques. Moreover, the present 
method is amenable to throughput screening techniques and 

2 5 automation. Combined with a novel method for cDNA 

library construction in which directional random primed 
cDNA libraries are prepared, the invention comprises a 
powerful and approach to the large scale isolation of 
novel secreted proteins. 

30 The invention features a method for identifying a 

cDNA nucleic acid encoding a mammalian protein having a 
signal sequence, which method includes the following 
steps : 

a) providing library of mammalian cDNA; 
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b) ligating the library of mammalian cDNA to DNA 
encoding alkaline phosphatase lacking both a signal 
sequence and a membrane anchor sequence to form ligated 
DNA; 

5 c) transforming bacterial cells with the ligated 

DNA to create a bacterial cell clone library; 

d) isolating DNA comprising the mammalian cDNA 
from at least one clone in the bacterial cell clone 
library; 

10 e) separately transfecting DNA isolated from 

clones in step (d) into mammalian cells which do not 
express alkaline phosphatase to create a mammalian cell 
clone library wherein each clone in the mammalian cell 
clone library corresponds to a clone in the bacterial 

15 cell clone library; 

f ) identifying a clone in the mammalian cell clone 
library which express alkaline phosphatase; 

g) identifying the clone in the bacterial cell 
clone library corresponding to the clone in the mammalian 

20 cell clone library identified in step (f ) ; and 

h) isolating and sequencing a portion of the 
mammalian cDNA present in the bacterial cell library 
clone identified in step (g) to identify a mammalian cDNA 
encoding a mammalian protein having a signal sequence. 

25 A cDNA library is a collection of nucelic acid 

molecueles that are a cDNA copy of a sample of mRNA. 

In another aspect, the invention features ptrAP3 
expression vector. 

In another aspect, the invention features a 
30 substantially pure preparation of ethb0018f2 protein. 

Preferably, the ethb0018f2 protein includes an amino acid 
sequence substantially identical to the amino acid 
sequence shown in FIG. 5 (SEQ ID NO: 5) ; is derived from 
a mammal, for example, a human. 
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The invention also features purified DNA (for 
example, cDNA) which includes a sequence encoding a 
ethb0018f2 protein, preferably encoding a human 
ethb0018f2 protein (for example, the ethb0018f2 protein 
5 of FIG. 5; SEQ ID NO: 5); a vector and a cell which 

includes a purified DNA of the invention; and a method of 
producing a recombinant ethb00l8f2 protein involving 
providing a cell transformed with DNA encoding ethb0018f2 
protein positioned for expression in the cell, culturing 
10 the transformed cell under conditions for expressing the 
DNA, and isolating the recombinant ethb0018f2 protein* 
The invention further features recombinant ethb0018f2 
protein produced by such expression of a purified DNA of 
the invention. 

15 By "ethb0018f2 protein" is meant a polypeptide 

which has a biological activity possesed by naturally- 
occuring ethb0018f2 protein. Preferably, such a 
polypeptide has an amino acid sequence which is at least 
85%, preferably 90%, and most preferably 95% or even 99% 

20 identical to the amino acid sequence of the ethb0018f2 
protein of FIG. 5 (SEQ ID NO: 5) . 

By "substantially identical" is meant a 
polypeptide or nucleic acid having a sequence that is at 
least 85%, preferably 90%, and more preferably 95% or 

25 more identical to the sequence of the reference amino 
acid or nucleic acid sequence. For polypeptides, the 
length of the reference polypeptide sequence will 
generally be at least 16 amino acids, preferably at least 
20 amino acids, more preferably at least 25 amino acids, 

3 0 and most preferably 35 amino acids. For nucleic acids, 
the length of the reference nucleic acid sequence will 
generally be at least 50 nucleotides, preferably at least 
60 nucleotides, more preferably at least 75 nucleotides, 
and most preferably 110 nucleotides. 
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Sequence identity can be measured using sequence 
analysis software (e.g., Sequence Analysis Software 
Package of the Genetics Computer Group, University of 
Wisconsin Biotechnology Center, 1710 University Avenue, 
5 Madison, WI 53705) . 

In the case of polypeptide sequences which are 
less than 100% identical to a reference sequence, the 
non- identical positions are preferably, but not 
necessarily, conservative substitutions for the reference 

10 sequence. Conservative substitutions typically include 
substitutions within the following groups: glycine and 
alanine; valine, isoleucine, and leucine; aspartic acid 
and glutamic acid; asparagine and glutamine; serine and 
threonine; lysine and arginine; and phenylalanine and 

15 tyrosine. 

Where a particular polypeptide is the to have a 
specific percent identity to a reference polypeptide of a 
defined length, the percent identity is relative to the 
reference peptide. Thus, a peptide that is 50% identical 

20 to a reference polypeptide that is 100 amino acids long 
can be a 50 amino acid polypeptide that is completely 
identical to a 50 amino acid long portion of the 
reference polypeptide. It might also be a 100 amino acid 
long polypeptide which is 50% identical to the reference 

25 polypeptide over its entire length. Of course, many 
other polypeptides will meet the same criteria. 

By "protein" and "polypeptide" is meant any chain 
of amino acids, regardless of length or post- 
translational modification (e.g., glycosylation or 

3 0 phosphorylation) . 

By "substantially pure" is meant a preparation 
which is at least 60% by weight (dry weight) the compound 
of interest, i.e., a ethb0018f2 protein. Preferably the 
preparation is at least 75%, more preferably at least 

35 90%, and most preferably at least 99%, by weight the 



BNSDOCID: <WO 9822491 A 1_L> 



WO 98/22491 



PCT/US97/20201 



- 7 - 

compound of interest. Purity can be measured by any 
appropriate method , e.g., column chromatography, 
polyacrylamide gel electrophoresis, or HPLC analysis. 
By "purified DNA" is meant DNA that is not 
5 immediately contiguous with both of the coding sequences 
with which it is immediately contiguous (one on the 5' 
end and one on the 3 ' end) in the naturally occurring 
genome of the organism from which it is derived. The 
term therefore includes, for example, a recombinant DNA 

10 which is incorporated into a vector; into an autonomously 
replicating plasmid or virus; or into the genomic DNA of 
a prokaryote or eukaryote, or which exists as a separate 
molecule (e.g., a cDNA or a genomic DNA fragment produced 
by PCR or restriction endonuclease treatment) independent 

15 of other sequences. It also includes a recombinant DNA 
which is part of a hybrid gene encoding additional 
polypeptide sequence . 

By "substantially identical" is meant an amino 
acid sequence which differs only by conservative amino 

2 0 acid substitutions, for example, substitution of one 

amino acid for another of the same class (e.g., valine 
for glycine, arginine for lysine, etc.) or by one or more 
non-conservative substitutions, deletions, or insertions 
located at positions of the amino acid sequence which do 
25 not destroy the function of the protein (assayed, e.g., 
as described herein) . Preferably, such a sequence is at 
least 85%, more preferably 90%, and most preferably 95% 
identical at the amino acid level to the sequence of FIG. 
5 (SEQ ID NO: 5) . For nucleic acids, the length of 

3 0 comparison sequences will generally be at least 50 

nucleotides, preferably at least 60 nucleotides, more 
preferably at least 75 nucleotides, and most preferably 
110 nucleotides. A "substantially identical" nucleic 
acid sequence codes for a substantially identical amino 
35 acid sequence as defined above. 
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By "transformed cell" is meant a cell into which 
(or into an ancestor of which) has been introduced, by 
means of recombinant DNA techniques, a DNA molecule 
encoding (as used herein) ethb0018f2 protein. 
5 By "positioned for expression" is meant that the 

DNA molecule is positioned adjacent to a DNA sequence 
which directs transcription and translation of the 
sequence (i.e., facilitates the production of ethb0018f2 
protein) . 

10 By "purified antibody" is meant antibody which is 

at least 60%, by weight, free from the proteins and 
naturally-occurring organic molecules with which it is 
naturally associated. Preferably, the preparation is at 
least 75%, more preferably at least 90%, and most 

15 preferably at least 99%, by weight, antibody. 

By "specifically binds" is meant an antibody which 
recognizes and binds ethb0018f2 protein but which does 
not substantially recognize and bind other molecules in a 
sample, e.g., a biological sample, which naturally 

2 0 includes ethb0018f2 protein. 

Unless otherwise defined, all technical and 
scientific terms used herein have the same meaning as 
commonly understood by one of ordinary skill in the art 
to which this invention belongs. Although methods and 

25 materials similar or equivalent to those described herein 
can be used in the practice or testing of the present 
invention, the preferred methods and materials are 
described below. All publications, patent applications, 
patents, and other references mentioned herein are 

30 incorporated by reference in their entirety. In case of 
conflict, the present specification, including 
definitions, will control. In addition, the materials, 
methods, and examples are illustrative only and not 
intended to be limiting. 
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Other features and advantages of the invention 
will be apparent from the following detailed description 
and from the claims. 



Brief Description of the Drawings 
5 Figure 1 is a schematic drawing of a portion of 

the ptrAP3 vector. 

Figure 2 is a representation of the DNA sequence 
of the ptrAP3 vector (SEQ ID NO:l). The bold, underlined 
portion is the small fragment removed prior to cDNA 
10 insertion sequence. The italic, underlined portion is 
the alkaline phosphatase sequence. 

Figure 3 is a representation of the amino acid 
sequence of human placental alkaline phosphatase 
(Accession No. P05187) . The underlined portion is the 
15 signal sequence. The bold, underlined portion is the 
membrane anchor sequence. 

Figure 4 is a representation of the amino acid 
sequence of the alkaline phosphatase encoded by ptrAP3 . 

Figure 5 is a representation of the cDNA and amino 
20 acid sequence of a portion of a novel secreted protein 
identified using the method described in Example 1. 

Figure 6 is a representation of an alignment of 
the amino acid sequence of clone ethb0018f2 (referred to 
here as 8f2) and proteins containing conserved IgG 
25 domains. The proteins are D38492 (neural adhesion 
molecule f3) ; P20241EURO (Drosophila Neuroglian) ; 
P32004EURA (human neural adhesion molecule LI) ; P35331G- 
CA (chick neural adhesion molecule related protein) ; 
Q02246XONI (human Axonin 1) ; U11031 (rat neural adhesion 
30 molecule BXG1) ; and X65224 (chicken Neurofascin) are 
depicted. In this figure, conserved motifs within the 
IgG domain are highlighted in bold. 
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Detailed Description 
In general terms, the method of the invention 
entails the following steps: 

1. Preparation of a randomly primed cDNA library 
5 using cDNA prepared from mRNA extracted from mammalian 

cells or tissue. The cDNA is inserted into a mammalian 
expression vector adjacent to a cDNA encoding placental 
alkaline phosphatase which lacks a secretory signal. 

2. Amplification of the cDNA library in bacteria. 
10 3. Isolation of the cDNA library. 

4. Transfection of the resulting cDNA library 
into mammalian cells. 

5. Assay of supernatants from the transfected 
mammalian cells for alkaline phosphatase activity. 

15 6. Isolation and sequencing of plasmid DNA clones 

registering a positive score in the alkaline phosphatase 
assay. 

7. Isolation of full length cDNA clones of novel 
proteins having a signal sequence. 
20 The mammalian cDNA used to create the cDNA library 

can be prepared using any known method. Generally, the 
cDNA is produced from mRNA. The mRNA can be isolated 
from any desired tissue or cell type. For example, 
peripheral blood cells, primary cells, tumor cells, or 
25 other cells may be used as a source of mRNA. 

The expression vector harboring the modified 
alkaline phosphatase gene can be any vector suitable for 
expression of proteins in mammalian cells. 

The mammalian cells used in the transfection step 
30 can be any suitable mammalian cells, e.g., CHO cells, 
mouse L cells, Hela cells, VERO cells, mouse 3T3 cells, 
and 293 cells. 

Described below is a specific example of the 
method of the invention. Also described below are two 
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genes, one known and one novel, identified using this 
method . 

Example I 

Step 1 Generation of Mammalian Signal Peptide Trap cDNA 
5 Libraries 
Vector 

A cDNA library was prepared using ptrAP3 , a 
mammalian expression vector containing a cDNA encoding 
human placental alkaline phosphatase (AP) lacking a 

10 signal sequence (FIG* 1 and FIG. 2, SEQ ID NO:l). When 
ptrAP3 is transfected into a mammalian cell line, such as 
COS7 cells, AP protein is neither expressed nor secreted 
since the AP cDNA of ptraAP3 does not encode a 
translation initiating methionine, a signal peptide, or a 

15 membrane anchor sequence. FIG. 3 (SEQ ID NO: 2) provides 
the amino acid sequence of naturally occurring AP. FIG. 
4 (SEQ ID NO: 3) provides the amino acid sequence of the 
form of AP encoded by ptrAP3. However, insertion of a 
cDNA encoding a signal peptide sequence into ptrAP3 such 

20 that the signal sequence within the cDNA is fused to and 
in frame with AP, facilities both the expression and 
secretion of AP protein upon transfection of the DNA into 
COS7 cells or other mammalian cells. The presence of AP 
activity in the supernatants of transfected COS7 cells 

25 therefore indicates the presence of a signal sequence in 
the cDNA of interest. 

cDNA Synthesis and Ligation 
cDNA for ligation to the ptrAP3 vector was 
prepared from messenger RNA isolated from human fetal 

30 brain tissue (Clontech, Palo Alto, CA: Catalog #6525-1) 
by a modification of a commercially available "ZAP cDNA 
synthesis kit ,f (Stratagene; La Jolla, CA: Catalog # 
200401) . Synthesis of cDNA involved the following steps. 
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(a) Single stranded cDNA was synthesized from 5 jxg 
of human fetal brain messenger RNA using a random hexamer 
primer incorporating a Xhol restriction site 
(underlined); 5 9 -CTG ACTCGAG NNNNNN-3 / (SEQ ID NO: 4). This 

5 represented a deviation from the Stratagene protocol and 
resulted in a population of randomly primed cDNA 
molecules. Random priming was employed rather than the 
oligo d(T) priming method suggested by Stratagene in 
order to generate short cDNA fragments, some of which 
10 would be expected to be mRNAs that encode signal 
sequences . 

(b) The single stranded cDNA generated in step (a) 
was rendered double stranded, and DNA linkers containing 
a free EcoRl overhang were ligated to both ends of the 

15 double stranded cDNAs using reagents and protocols from 
the Stratagene ZAP cDNA synthesis kit according to the 
manufacturer ' s instructions . 

(c) The linker-adapted double-stranded cDNA 
generated in step (b) was digested with Xhol to generate 

20 a free Xhol overhang at the 3 9 end of the cDNAs using 
reagents from the Stratagene ZAP cDNA synthesis kit 
according to the manufacturers instructions. 

(d) Linker-adapted double-stranded cDNAs were size 
selected by gel filtration through S EPHACRYL 1 * S-500 cDNA 

25 Size Fractionation Columns (Gibco BRL; Bethesda, MD: 
Catalog #18092-015) according to the manufacturers 
instructions . 

(e) Size selected, double-stranded cDNAs 
containing a free EcoRl overhang at the 5 9 end and a free 

30 Xhol overhang at the 3' end were ligated to the ptrAP3 
backbone which had been digested with EcoRl and Xhol and 
purified from the small, released fragment by agarose gel 
electrophoresis • 

(f ) Ligated plasmid DNAs were transformed into E_s_ 
3 5 Coli strain DHlOb by electroporation. 
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This process resulted in a library of cDNA clones 
composed of several million random primed cDNAs (some of 
which will encode signal sequences) prepared from human 
fetal brain messenger RNA, fused to the AP reporter cDNA, 
5 in the mammalian expression vector ptrAP3. 

Step 2 Plating and Automated Picking of Bacterial 
Colonies 

Next, the transformed bacterial cells were plated, 
and individual clones were identified. A sample of 

10 transformed E. coli containing the random primed human 
fetal brain cDNA library described in Step 1 was plated 
for growth as individual colonies, using standard 
procedures. Each E. coli colony contained an individual 
cDNA clone fused to the AP reporter in the ptrAP3 

15 expression vector. Approximately 20,000 such E. coli 

colonies were plated, representing approximately 0.5% of 
the total cDNA library. 

Next, E. coli colonies were picked from the plates 
and inoculated into deep well 96 well plates containing 1 

20 ml of growth medium prepared by standard procedures. 

Colonies were picked from the plates and E. coli cultures 
were grown overnight by standard procedures. Each plate 
was identified by number. Within each plate, each well 
contained an individual cDNA clone in the ptrAP vector 

25 identified by well position. 

Finally, plasmid DNA was extracted from the 
overnight E. coli cultures using a semi-automated 96-well 
plasmid DNA miniprep procedure, employing standard 
procedures for bacterial lysis, genomic DNA precipitation 

3 0 and plasmid DNA purification. 

The plasmid DNA extraction was performed as 
follows: 

(a) E. coli were centrifuged for 20 minutes using 
a Beckman Centrifuge at 32 00 rpm. 
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(b) Supernatant was discarded and E. coli pellets 
were resuspended in 130 /xl WP1 (50 mM TRIS (pH 7.5), 10 
mM EDTA, 100 ^g/ml RNase A) resuspension solution using a 
TITERTECK MULTIDROP™ apparatus. 
5 (c) E. coli pellets were resuspended by vortexing. 

(d) 13 0 /il WP2 (0-2 M NaOH, 0.5% SDS) lysing 
solution was added to each well, and the samples were 
mixed by vortexing for 5 seconds. 

(e) 13 0 /xl WP3 (125 mM potassium acetate, pH 4.8) 
10 neutralizing solution was added to each well, and the 

samples were mixed by vortexing for 5 seconds. 

(f) Samples were placed on ice for 15 minutes, 
mixed by vortexing for 5 seconds, and recentrif uged for 
10 minutes at 3200 rpm in a Beckman Centrifuge. 

15 (g) Supernatant (crude DNA extract) was 

transferred from each well of each 96 well plate into a 
96 well filter plate (Polyf iltronics) using a 
TOMTEC/Quadra 96™ transfer apparatus. 

(h) 480 Ml of Wizard™ Midiprep DNA Purification 
2 0 Resin (Pr omega) was added to each well of each plate 

containing crude DNA extract using a Titertek Multidrop 
apparatus and the samples were left for 5 minutes. 

(i) Each 96 well filter plate was placed on a 
vacuum housing (Polyf iltronics) and the liguid in each 

2 5 well was removed by suction generated by vacuum created 

with a Lab Port Vacuum pump. 

(j) The Wizard Midiprep DNA Purification Resin in 
each well (to which plasmid DNA was bound) was washed 
four times with 600 /xl of Wizard Wash™. 

3 0 (k) Plates were centrifuged for 5 minutes to 

remove excessive moisture from the Wizard Midiprep DNA 
Purification Resin. 

(1) Purified plasmid DNAs were eluted from the 
Wizard Midiprep DNA Purification Resin into collection 
35 plates by addition of 50 /xl deionized water to each well 
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using a Multidrop 8 Channel Pipette, incubation at room 
temperature for 15 minutes, and centrif ugation for 5 
minutes (32 00 rpm, Beckman centrifuge) . 

This process resulted in preparation of plasmid 
5 DNA contained in 96 well plates with each well containing 
an individual cDNA clone ligated in the ptrAP expression 
vector. Individual clones were identified by plate 
number and well position* 

Step 4 Transfection of DNAs into COS7 cells 
10 To determine which of the cDNA clones contained 

within the cDNA library encoded functional signal 

peptides, individual plasmid DNA preparations were 

transfected into COS7 cells as follows. 

For each 96 well plate of DNA preparations, one 96 
15 well tissue culture plate containing approximately 10,000 

COS7 cells per well was prepared using standard 

procedures . 

Immediately prior to DNA transfection, the COS7 
cell culture medium in each well of each 96 well plate 

2 0 was replaced with 80 ul of OptiMEM (Gibco-BRL; catalog 
#31985-021) containing 1 /il of lipof ectamine (Gibco-BRL) 
and 2 jil (approximately 100-200 ng) of DNA prepared as 
described above. Thus, each well of each 96 well plate 
containing COS7 cells received DNA representing one 

25 individual cDNA clone from the cDNA library in ptrAP3. 
The COS7 cells were incubated with the Opti- 
MEM/ Lipof ectamine /DNA mixture overnight to allow 
transfection of cells with the plasmid DNAs. 

After overnight incubation, the transfection 

30 medium was removed from the cells and replaced with 80 jzl 
fresh medium composed of Opti-MEM + 1% fetal calf serum. 
Cells were incubated overnight. 
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Step 5 Alkaline Phosphatase Assay 

The secreted alkaline phosphatase activity of the 
transfected COS7 cells was measured as follows. Samples 
(10 /xl) of supernatants from the transfected COS7 cells 
5 were transferred from each well of each 96 well plate 
into one well of a Microfluor scintillation plate 
(Dynatech: Location Catalog #011-010-7805) . AP activity 
in the supernatants was determined using the Phospha- 
Light Kit (Tropix Inc.; catalog #BP300) . AP assays were 
10 performed according to the manufacturer's instruction 
using a Wallace Micro-Beta scintillation counter. 

Step 6 Sequencing and Analysis of Positive Clones 

The individual plasmid DNAs scoring positive in 
the COS7 cell AP secretion assay were analyzed further by 

15 DNA sequencing using standard procedures. The resulting 
DNA sequence information was used to perform BLAST 
sequence similarity searches of nucleotide protein 
databases to ascertain whether the clone in question 
encodes either 1) a known secreted or membrane-associated 

20 protein possessing a signal sequence, or 2) a putative 

novel , secreted or membrane-associated protein possessing 
a putative novel signal sequence. 

Identification of the Protein Tyrosine Phosphatase Sigma 
(PTPcrl Signal Sequence bv Mammalian Signal Peptide trAP 

25 Employing the method described in Example 1, a 

cDNA clone designated ethb005c07 was found to score 
positive in the COS7 cell transfection AP assay. BLAST 
similarity searching with the DNA sequence from this 
clone identified ethb005c07 as a cDNA encoding the signal 

3 0 sequence of protein tyrosine phosphatase sigma (PTPa) , a 
previously described protein that is well established in 
the scientific literature to be a transmembrane prot in 
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(Pulido et al., Proc. Nat'l Acad. Sci. USA 92:11686, 
1995) . 

Identif ication of a Novel Immunoglobulin Domain 
Containing Protein by Mammalian Signal Peptide trAP 
5 Employing the method described in Example 1, a 

cDNA clone designated ethb0018f2 was found to score 
positive in the COS7 cell transfection AP assay. DNA 
sequencing revealed that ethb0018f2 harbors a 1455 base 
pair cDNA having a single open reading frame commencing 

10 at nucleotide 55 and continuing to nucleotide 1455. 

Thus, the ethb0018f2 cDNA encodes a 4 67 amino acid open 
reading frame (FIG. 5, SEQ ID NO: 5) fused to the AP 
reporter. Inspection of the ethb0018f2 protein sequence 
revealed the presence of a putative signal sequence 

15 between amino acids 1 to 20, predicted by the signal 
peptide prediction algorithm, signal P (Von Heijne, 
Nucleic Acids. Reg. 14:4683-90, 1986). Thus, ethb0018f2 
encodes a partial clone of a novel putative 
secreted/membrane protein. BLAST similarity searching of 

2 0 nucleic acid and protein databases with the ethb0018f2 
DNA sequence from this clone revealed similarity to a 
family of proteins known to contain a protein motif 
referred to as an Immunoglobulin of IgG domain. 

Further visual inspection of the ethb0018f2 

2 5 protein sequence resulted in the identification of 5 

consecutive IgG repeats, defined by a conserved spacing 
of cysteine, tryptophan, tyrosine, and cysteine residues 
(FIG. 5). 

FIG. 6 is a depiction of a protein sequence 

3 0 alignment between clone ethb00l8f2 (referred to as 8f2) 

and seven related proteins known to contain IgG domains 
that are also known to be expressed in the brain. These 
proteins are rat neural adhesion molecule f3 (D38492) , 
Drosophila Neuroglian (P20241) , human neural adhesion 
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molecule LI (P32004) , chick neural adhesion molecule 
related (P35331) , human Axonin 1 (Q02246) , rat neural 
adhesion molecule BIG1 (U11031) and chicken Neurofascin 
(X65224) . Given this sequence similarity, it is likely 
5 that clone ethb0018f2 represents a partial cDNA cone 
representing a novel protein, expressed in the brain, 
which contains multiple, consecutive IgG domains. 
Specifically, since the closest relatiaves of clone 
ethb0018f2 are believed to function as neural adhesion 
10 molecules, it is likely that clone ethb00l8f2 represents 
a partial cDNA clone of a novel neural adhesion molecule. 

Other Embodiments 
It is to be understood that while the invention 
has been described in conjunction with the detailed 
15 description thereof, that the foregoing description is 
intended to illustrate and not limit the scope of the 
invention, which is defined by the scope of the appended 
claims. 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 
(i) APPLICANT: Millennium Biotherapeutics, Inc. 

(ii) TITLE OF THE INVENTION: METHOD FOR IDENTIFYING GENES 

ENCODING NOVEL SECRETED OR MEMBRANE -ASSOCIATED PROTEIN 

(iii) NUMBER OF SEQUENCES: 14 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fish & Richardson, P.C. 

(B) STREET: 225 Franklin Street 

(C) CITY: Boston 

( D ) STATE : MA 

(E) COUNTRY: US 

(F) ZIP: 02110-2804 

(v) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: Windows95 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US97 / 

(B) FILING DATE: 04-NOV-1997 
<C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/752,307 

(B) FILING DATE: 19-NOV-1996 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Meiklejohn, Ph.D., Anita L. 

(B) REGISTRATION NUMBER: 35,283 

(C) REFERENCE /DOCKET NUMBER: 09404/020WO1 
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(A) TELEPHONE: 617-542-5070 

(B) TELEFAX: 617-542-8906 

(C) TELEX: 200154 



(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4951 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

AAGCTTGGCT GTGGAATGTG TGTCAGTTAG GGTGTGG AAA GTCCCCAGGC TCCCCAGCAG 60 

GCAGAAGTAT GCAAAGCATG CATCTCAATT AGTCAGCAAC CAGGTGTGGA AAGTCCCCAG 120 

GCTCCCCAGC AGGCAGAAGT ATGCAAAGCA TGCATCTCAA TTAGTCAGCA ACCATAGTCC 180 

CGCCCCTAAC TCCGCCCATC CCGCCCCTAA CTCCGCCCAG TTCCGCCCAT TCTCCGCCCC 240 

ATGGCTGACT AATTTTTTTT ATTTATGCAG AGGCCGAGGC CGCCTCGGCC TCTGAGCTAT 300 

TCCAGAAGTA GTGAGGAGGC TTTTTTGGAG GCCTAGGCTT TTGCAAAAAG CTCCTCCGAT 360 

CGAGGGGCTC GCATCTCTCC TTCACGCGCC CGCCGCCCTA CCTGAGGCCG CCATCCACGC 420 

CGGTTGAGTC GCGTTCTGCC GCCTCCCGCC TGTGGTGCCT CCTGAACTGC GTCCGCCGTC 480 

TAGGTAAGTT TAAAGCTCAG GTCGAGACCG GGCCTTTGTC CGGCGCTCCC TTGGAGCCTA 540 

CCTAGACTCA GCCGGCTCTC CACGCTTTGC CTGACCCTGC TTGCTCAACT CTACGTCTTT 600 

GTTTCGTTTT CTGTTCTGCG CCGTTACAGA TCCAAGCTCT GAAAAACCAG AAAGTTAACT 660 

GGTAAGTTTA GTCTTTTTGT CTTTTATTTC AGGTCCCAGG TCCCGGATCC GGTGATCCAA 720 

ATCTAAGAAC TGCTCCTCAG TGAGTGTTGC CTTTACTTCT AGGCCTGTAC GGAAGTGTTA 780 

CTTCTGCTCT AAAAGCTGCG GAATTCGCAC CACCGTAGTT TTTACGCCCG GTGAGCGCTC 840 
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CACCCGCACC 
GGCCAACGAG 
GCCGCTGGAC 
GCCCACGCTT 
ACCCACCGTG 
GACCGTGGAG 
GGGACTGGGC 
CACTGCCACA 
AGTTGAGGAG 
CAAGAAGCTG 
GATGGGGGTG 
GGGGCCTGAG 
CAATGTAGAC 
CAAGGGCAAC 
GACACGCGGC 
GGGAGTGGTA 
GGTGAACCGC 
CCAGGACATC 
CCGAAAGTAC 
AGGTGGGACC 
TGCCCGGTAT 
CCATCTCATG 
ACTGGACCCC 
CCGCGGCTTC 
GGCTTACCGG 
GCTCACCAGC 
CTTCGGAGGC 
GGACAGGAAG 
CGGCGCCCGG 
AGCAGTGCCC 
CCCGCAGGCG 
CTTCGCCGCC 
CGACGCCGCG 
ACCTGAAACA 
GTTACAAATA 
CTAGTTGTGG 
GAGCTCGAAT 
GGCTGCGGCG 
GGGATAACGC 
AGGCCGCGTT 
GACGCTCAAG 
CTGGAAGCTC 
CCTTTCTCCC 
CGGTGTAGGT 
GCTGCGCCTT 
CACTGGCAGC 
AGTTCTTGAA 
CTCTGCTGAA 
CCACCGCTGG 
GATCTCAAGA 
CACGTTAAGG 
ATTAAAAATG 
ACCAATGCTT 
TTG CCTG ACT 
GTGCTGCAAT 
AGCCAGCCGG 
CTATTAATTG 
TTGTTGCCAT 
GCTCCGGTTC 
TTAGCTCCTT 
TGGTTATGGC 
TGACTGGTGA 
CTTGCCCGGC 
TCATTGGAAA 
GTTCGATGTA 
TTTCTGGGTG 
GGAAATGTTG 
ATTGTCTCAT 
CGCGCACATT 



TACAAGCGCG 
CGCCTCGGGG 
GAGGGCAACC 
GCACCGTCCG 
CAGCTGATGG 
CCTGGGCTGG 
GTGCAGACCG 
GAGGGCATGG 
GAGAACCCGG 
CAGCCTGCAC 
TCTACGGTGA 
ATACCCCTGG 
AAAC ATG TG C 
TTCCAGACCA 
AACGAGGTCA 
ACCACCACAC 
AACTGGTACT 
GCTACGCAGC 
ATGTTTCGCA 
AGGCTGGACG 
GTGTGGAACC 
GGTCTCTTTG 
TCCCTGATGG 
TTCCTCTTCG 
GCACTGACTG 
GAGGAGGACA 
TACCCCCTGC 
GCCTACACGG 
CCGGATGTTA 
CTGGACGAAG 
CACCTGGTTC 
TGCCTGGAGC 
CACCCGGGTT 
TAAAATGAAT 
AAGCAATAGC 
TTTGTCCAAA 
TAATTCCTCT 
AGCGGTATCA 
AGGAAAGAAC 
GCTGGCGTTT 
TCAGAGGTGG 
CCTCGTGCGC 
TTCGGGAAGC 
CGTTCGCTCC 
ATCCGGTAAC 
AGCCACTGGT 
GTGGTGGCCT 
GCCAGTTACC 
TAGCGGTGGT 
AGATCCTTTG 
GATTTTGGTC 
AAGTTTTAAA 
AATCAGTGAG 
CCCCGTCGTG 
GATACCGCGA 
AAGGGCCGAG 
TTGCCGGGAA 
TGCTACAGGC 
CCAACGATCA 
CGGTCCTCCG 
AGCACTGCAT 
GTACTCAACC 
GTCAATACGG 
ACGTTCTTCG 
ACCCACTCGT 
AGCAAAAACA 
AATACTCATA 
GAGCGGATAC 
TCCCCGAAAA 



TGTATGATGA 
AGTTTGCCTA 
CAACACCTAG 
AAGAAAAGCG 
TACCCAAGCG 
AGCCCGAGGT 
TGGACGTTCA 
AGACACAAAC 
ACTTCTGGAA 
AGACAGCCGC 
CAGCTGCCAG 
CCATGGACCG 
CAGACAGTGG 
TTGGCTTGAG 
TCTCCGTGAT 
GAGTGCAGCA 
CGGACGCCGA 
TCATCTCCAA 
TGGGAACCCC 
GGAAGAATCT 
GCACTGAGCT 
AGCCTGGAGA 
AGATGACAGA 
TGGAGGGTGG 
AGACGATCAT 
CGCTGAGCCT 
GAGGGAGCTC 
TCCTCCTATA 
CCGAGAGCGA 
AGACCCACGC 
ACGGCGTGCA 
CCTACACCGC 
GAACTAGTCT 
GCAATTGTTG 
ATCACAAATT 
CTCATCAATG 
TCCGCTTCCT 
GCTCACTCAA 
ATGTGAGCAA 
TTCCATAGGC 
CGAAACCCGA 
TCTCCTGTTC 
GTGGCGCTTT 
AAGCTGGGCT 
TATCGTCTTG 
AACAGGATTA 
AACTACGGCT 
TTCGGAAAAA 
TTTTTTGTTT 
ATCTTTTCTA 
ATGAGATTAT 
TCAATCTAAA 
GCACCTATCT 
TAGATAACTA 
GACCCACGCT 
CGCAGAAGTG 
GCTAGAGTAA 
ATCGTGGTGT 
AGGCGAGTTA 
ATCGTTGTCA 
AATTCTCTTA 
AAGTCATTCT 
GATAATACCG 
GGGCGAAAAC 
GCACCCAACT 
GGAAGGCAAA 
CTCTTCCTTT 
ATATTTGAAT 
GTGCCACCTG 
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GGTGTACGGC 
CGGAAAGCGG 
CCTAAAGCCC 
CGGCCTAAAG 
CCAGCGACTG 
CCGCGTGCGG 
GATACCCACC 
GTCCCCGGTT 
CCGCGAGGCA 
CAAGAACCTC 
GATCCTAAAA 
CTTCCCATAT 
AGCCACAGCC 
TGCAGCCGCC 
GAATCGGGCC 
CGCCTCGCCA 
CGTGCCTGCC 
CATGGACATT 
AGACCCTGAG 
GGTGCAGGAA 
CATGCAGGCT 
CATGAAATAC 
GGCTGCCCTG 
TCGCATCGAC 
GTTCGACGAC 
CGTCACTGCC 
CATCTTCGGG 
CGGAAACGGT 
GAGCGGGAGC 
AGGCGAGGAC 
GGAGCAGACC 
CTGCGACCTG 
AGAGAAAAAA 
TTGTTAACTT 
TCACAAATAA 
TATCTTATCA 
CGCTCACTGA 
AGGCGGTAAT 
AAGG CCAGC A 
TCCGCCCCCC 
CAGGACTATA 
CGACCCTGCC 
CTCAATGCTC 
GTGTGCACGA 
AGTCCAACCC 
GCAGAGCGAG 
ACACTAGAAG 
GAGTTGGTAG 
GCAAGCAGCA 
CGGGGTCTGA 
CAAAAAGGAT 
GTATATATGA 
CAGCGATCTG 
CGATACGGGA 
CACCGGCTCC 
GTCCTGCAAC 
GTAGTTCGCC 
CACGCTCGTC 
CATGATCCCC 
GAAGTAAGTT 
CTGTCATGCC 
GAGAATAGTG 
CGCCACATAG 
TCTCAAGGAT 
GATCTTCAGC 
ATGCCGCAAA 
TTCAATATTA 
GTATTTAGAA 
C 



GACGAGGACC 
CATAAGGACA 
GTGACACTGC 
CGCGAGTCTG 
GAAGATGTCT 
CCAATCAAGC 
ACCAGTAGCA 
GCCTAGCTCG 
GCCGAGGCCC 
ATCATCTTCC 
GGGCAGAAGA 
GTGGCTCTGT 
ACGGCCTACC 
CGCTTTAACC 
AAGAAAGCAG 
GCCGGCACCT 
TCGGCCCGCC 
GACGTGATCC 
TACCCAGATG 
TGGCTGGCGA 
TCCCTGGACC 
GAGATCCACC 
CGCCTGCTGA 
CATGGTCATC 
GCCATTGAGA 
GACCACTCCC 
CTGGCCCCTG 
CCAGGCTATG 
CCCGAGTATC 
GTGGCGGTGT 
TTCATAGCGC 
GCGCCCCCCG 
CCTCCCACAC 
GTTTATTGCA 
AGCATTTTTT 
TGTCTGGATC 
CTCGCTGCGC 
ACGGTTATCC 
AAAGGCCAGG 
TG ACG AG CAT 
AAGATACCAG 
GCTTACCGGA 
ACG CTGTAGG 
ACCCCCCGTT 
GGTAAGACAC 
GTATGTAGGC 
GACAGTATTT 
CTCTTGATCC 
GATTACGCGC 
CGCTCAGTGG 
CTTCACCTAG 
GTAAACTTGG 
TCTATTTCGT 
GGGCTTACCA 
AGATTTATCA 
TTTATCCGCC 
AGTTAATAGT 
GTTTGGTATG 
CATGTTGTGC 
GGCCGCAGTG 
ATCCGTAAGA 
TATGCGGCGA 
CAGAACTTTA 
CTTACCGCTG 
ATCTTTTACT 
AAAGGGAATA 
TTGAAGCATT 
AAATAAACAA 
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TGCTTGAGCA 


900 


TGTTGGCGTT 


960 


AGCAGGTGCT 


1020 


GTGACTTGGC 


1080 


TGGAAAAAAT 


1140 


AGGTGGCACC 


1200 


CTAGTATTGC 


1260 


AGATCATCCC 


1320 


TGGGTGCCGC 


1380 


TGGGCGATGG 


1440 


AGGACAAACT 


1500 


CCAAGACATA 


1560 


TGTGCGGGGT 


1620 


AGTGCAACAC 


1680 


GGAAGTCAGT 


1740 


ACGCCCACAC 


1800 


AGGAGGGGTG 


1860 


TAGGTGGAGG 


1920 


ACTACAGCCA 


1980 


AGCGCCAGGG 


2040 


CGTCTGTGAC 


2100 


GAGACTCCAC 


2160 


GCAGGAACCC 


2220 


ATGAAAGCAG 


2280 


GGGCGGGCCA 


2340 


ACGTCTTCTC 


2400 


GCAAGGCCCG 


2460 


TGCTCAAGGA 


2520 


GGCAGCAGTC 


2580 


TCGCGCGCGG 


2640 


ACGTCATGGC 


2700 


CCGGCACCAC 


2760 


CTCCCCCTGA 


2820 


GCTTATAATG 


2880 


TCACTGCATT 


2940 


CCCGGGTACC 


3000 


TCGGTCGTTC 


3060 


ACAGAATCAG 


3120 


AACCGTAAAA 


3180 


CACAAAAATC 


3240 


GCGTTTCCCC 


3300 


TACCTGTCCG 


3360 


TATCTCAGTT 


3420 


CAGCCCGACC 


3480 


GACTTATCGC 


3540 


GGTGCTACAG 


3600 


GGTATCTGCG 


3660 


GGCAAACAAA 


3720 


AGAAAAAAAG 


3780 


AACGAAAACT 


3840 


ATCCTTTTAA 


3900 


TCTGACAGTT 


3960 


TCATCCATAG 


4020 


TCTGGCCCCA 


4080 


GCAATAAACC 


4140 


TCCATCCAGT 


4200 


TTGCGCAACG 


4260 


GCTTCATTCA 


4320 


AAAAAAGCGG 


4380 


TTATCACTCA 


4440 


TGCTTTTCTG 


4500 


CCGAGTTGCT 


4560 


AAAGTGCTCA 


4620 


TTGAGATCCA 


4680 


TTCACCAGCG 


4740 


AGGGCGACAC 


4800 


TATCAGGGTT 


4860 


ATAGGGGTTC 


4920 




4951 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 530 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 



Met 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


Leu 


Gly 


Leu 


Arg 


Leu 


Gin 


Leu 


Ser 


Leu 


1 








5 










10 










15 




Gly 


lie 


lie 


Pro 


Val 


Glu 


Glu 


Glu 


Asn 


Pro 


Asp 


Phe 


Trp Asn 


Ara 


Glu 






20 










25 










30 




Ala 


Ala 


Glu 
35 


Ala 


Leu 


Gly 


Ala 


Ala 
40 


Lys 


Lys 


Leu 


Gin 


Pro 
45 


Ala 


Gin 


Thr 


Ala 


Ala 


Lvs 


Asn 


Leu 


lie 


lie 


Phe 


Leu 


Gly 


Asp 


Gly 


Met 


Gly 


Val 


Ser 




50 








55 










60 








Thr 


Val 


Thr 


Ala 


Ala 


Arg 


lie 


Leu 


Lys 


Gly 


Gin 


Lys 


Lys 


Asp 


Lvs 


Leu 


65 










70 










75 










80 


Glv 


Pro 


Glu 


lie 


Pro 


Leu 


Ala 


Met 


Asp 


Arg 


Phe 


Pro 


Tvr 


Val 


Ala 


Leu 








85 










90 










95 




Ser 


Lvs 


Thr 


Tyr 
100 


Asn 


Val 


Asp 


Lys 


His 
105 


Val 


Pro 


Asp 


Ser 


Gly 
110 


Ala 


Thr 


Ala 


Thr 


Ala 


Tyr 


Leu 


Cys 


Gly 


Val 


Lys 


Gly 


Asn 


Phe 


Gin 


Thr 


He 


Gly 






115 










120 










125 






Leu 


Ser 


Ala 


Ala 


Ala 


Arg 


Phe 


Asn 


Gin 


Cys 


Asn 


Thr 


Thr 


Arg 


Gly 


Asn 




130 










135 










140 








Glu 


Val 


lie 


Ser 


Val 


Met 


Asn 


Arg 


Ala 


Lys 


Lys 


Ala 


Gly 


Lys 


Ser 


Val 


145 










150 










155 










160 


Glv 

J: 


Val 


Val 


Thr 


Thr 


Thr 


Ara 


Val 


Gin 


His 


Ala 


Ser 


Pro 


Ala 


Glv 


Thr 








165 










170 










175 




Tvr 


Ala 


His 


Thr 


Val 


Asn 


Arg 


Asn 


Trp 


Tvr 


Ser 


Asp 


Ala 


Asp 


Val 


Pro 






180 










185 










190 






Ala 


Ser 


Ala 
195 


Ara 


Gin 


Glu 


Glv 


Cys 
200 


Gin 


Asp 


He 


Ala 


Thr 
205 


Gin 


Leu 


He 


Ser 


Asn 
210 


Met 


Asp 


lie 


Asp 


Val 
215 


He 


Leu 


Gly 


Gly 


Gly 
220 


Arq 


Lys 


Tvr 


Met 


Phe 


Ara 


Met 


Gly 


Thr 


Pro 


Asp 


Pro 


Glu 


Tyr 


Pro 


Asp 


Asp 


Tyr 


Ser 


Gin 


225 










230 










235 










240 


Gly 


Gly 


Thr 


Arg 


Leu 


Asp 


Gly 


Lys 


Asn 


Leu 


Val 


Gin 


Glu 


Trp 


Leu 


Ala 








245 










250 










255 




Lvs 


Arq 


Gin 


Gly 


Ala 


Arg 


Tyr 


Val 


Trp 


Asn 


Arg 


Thr 


Glu 


Leu 


Met 


Gin 






260 










265 










270 






Ala 


Ser 


Leu 


Asp 


Pro 


Ser 


Val 


Thr 


His 


Leu 


Met 


Gly 


Leu 


Phe 


Glu 


Pro 






275 








280 








285 








Glv 


Asp 


Met 


Lvs 


Tvr 


Glu 


lie 


His 


Arq 


Asp 


Ser 


Thr 


Leu 


Asp 


Pro 


Ser 


290 










295 










300 










Leu 


Met 


Glu 


Met 


Thr 


Glu 


Ala 


Ala 


Leu 


Arg 


Leu 


Leu 


Ser 


Arg 


Asn 


Pro 


305 










310 










315 








320 


Arg 


Gly 


Phe 


Phe 


Leu 
325 


Phe 


Val 


Glu 


Gly 


Gly 
330 


Arg 


He 


Asp 


His 


Gly 
335 


His 


His 


Glu 


Ser 


Arg 


Ala 


Tyr 


Arg 


Ala 


Leu 


Thr 


Glu 


Thr 


He 


Met 


Phe 


Asp 








340 










345 










350 




Asp 


Ala 


lie 


Glu 


Arg 


Ala 


Gly 


Gin 


Leu 


Thr 


Ser 


Glu 


Glu 


Asp 


Thr 


Leu 




355 








360 










365 






Ser 


Leu 


Val 


Thr 


Ala 


Asp 


His 


Ser 


His 


Val 


Phe 


Ser 


Phe 


Gly Gly 


Tyr 




370 










375 










380 








Pro 


Leu 


Arg 


Gly 


Ser 


Ser 


He 


Phe 


Gly 


Leu 


Ala 


Pro 


Gly 


Lys 


Ala 


Arg 


385 










390 










395 










400 


Asp 


Arg 


Lys 


Ala 


Tyr 


Thr 


Val 


Leu 


Leu 


Tyr 


Gly 


Asn 


Gly 


Pro 


Gly 


Tyr 










405 










410 










415 


Val 


Leu 


Lys 


Asp 


Gly 


Ala 


Arg 


Pro 


Asp 


Val 


Thr 


Glu 


Ser 


Glu 


Ser 


Gly 








420 










425 










430 




Ser 


Pro 


Glu 


Tyr 


Arg 


Gin 


Gin 


Ser 


Ala 


Val 


Pro 


Leu 


Asp Glu 


Glu 


Thr 






435 






440 










445 








His 


Ala 


Gly 


Glu 


Asp 


Val 


Ala 


Val 


Phe 


Ala 


Arg 


Gly 


Pro 


Gin 


Ala 


His 




450 






455 










460 
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Leu 


Val 


His 


Gly 


Val 


Gin 


Glu 


Gin 


Thr 


Phe 


He 


Ala 


His 


Val 


Met 


Ala 


465 








470 










475 










480 


Phe 


Ala 


Ala 


Cys 


Leu 


Glu 


Pro 


Tyr 


Thr 


Ala 


Cys 


Asp 


Leu 


Ala 


Pro 


Pro 










485 










490 








495 




Ala 


Gly 


Thr 


Thr 


Asp 


Ala 


Ala 


His 


Pro 


Gly 


Arg 


Ser 


Val 


Val 


Pro 


Ala 








500 










505 








510 






Leu 


Leu 


Pro 
515 


Leu 


Leu 


Ala 


Gly 


Thr 
520 


Leu 


Leu 


Leu 


Leu 


Glu 
525 


Thr 


Ala 


Thr 



Ala Pro 
530 



(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 489 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 

He He Pro Val Glu Glu Glu Asn Pro Asp Phe Trp Asn Arg Glu Ala 

1 5 10 15 

Ala Glu Ala Leu Gly Ala Ala Lys Lys Leu Gin Pro Ala Gin Thr Ala 

20 25 30 

Ala Lys Asn Leu lie He Phe Leu Gly Asp Gly Met Gly Val Ser Thr 

35 40 45 

Val Thr Ala Ala Arg He Leu Lys Gly Gin Lys Lys Asp Lys Leu Gly 

50 55 60 

Pro Glu He Pro Leu Ala Met Asp Arg Phe Pro Tyr Val Ala Leu Ser 
65 70 75 80 

Lys Thr Tyr Asn Val Asp Lys His Val Pro Asp Ser Gly Ala Thr Ala 

85 90 95 

Thr Ala Tyr Leu Cys Gly Val Lys Gly Asn Phe Gin Thr He Gly Leu 

100 105 110 

Ser Ala Ala Ala Arg Phe Asn Gin Cys Asn Thr Thr Arg Gly Asn Glu 

115 120 125 

Val He Ser Val Met Asn Arg Ala Lys Lys Ala Gly Lys Ser Val Gly 

130 135 140 

Val Val Thr Thr Thr Arg Val Gin His Ala Ser Pro Ala Gly Thr Tyr 
145 150 155 160 

Ala His Thr Val Asn Arg Asn Trp Tyr Ser Asp Ala Asp Val Pro Ala 

165 170 175 

Ser Ala Arg Gin Glu Gly Cys Gin Asp He Ala Thr Gin Leu He Ser 

180 185 190 

Asn Met Asp He Asp Val He Leu Gly Gly Gly Arg Lys Tyr Met Phe 

195 200 205 

Arg Met Gly Thr Pro Asp Pro Glu Tyr Pro Asp Asp Tyr Ser Gin Gly 

210 215 220 

Gly Thr Arg Leu Asp Gly Lys Asn Leu Val Gin Glu Trp Leu Ala Lys 
225 230 235 240 

Arg Gin Gly Ala Arg Tyr Val Trp Asn Arg Thr Glu Leu Met Gin Ala 

245 250 255 

Ser Leu Asp Pro Ser Val Thr His Leu Met Gly Leu Phe Glu Pro Gly 

260 265 270 

Asp Met Lys Tyr Glu He His Arg Asp Ser Thr Leu Asp Pro Ser Leu 

275 280 285 

Met Glu Met Thr Glu Ala Ala Leu Arg Leu Leu Ser Arg Asn Pro Arg 

290 295 300 

Gly Phe Phe Leu Phe Val Glu Gly Gly Arg He Asp His Gly His His 
305 310 315 ~ 320 

Glu Ser Arg Ala Tyr Arg Ala Leu Thr Glu Thr He Met Phe Asp Asp 

325 330 335 

Ala He Glu Arg Ala Gly Gin Leu Thr Ser Glu Glu Asp Thr Leu Ser 

340 345 350 

Leu Val Thr Ala Asp His Ser His Val Phe Ser Phe Gly Gly Tyr Pro 
355 360 365 
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Leu 


Arcr 
370 


Glv 


Ser Ser 


lie 


Phe 
375 


Glv 


Leu 


Ala 


Pro 


Glv 
380 


Lys 


Ala 


Arg 


Asp 


Arg 


Ly s 


Ala 


Tyr Thr 


Val 


Leu 


Leu 


Tvr 
j 


Glv 


Asn 


Glv 


Pro Gly 


Tvr* 


Val 


385 








390 










395 










400 


Leu 


Lvs 


Asp 


Gly Ala 
405 


Arq 


Pro 


Asp 


Val 


Thr 
410 


Glu 


Ser 


VJ1U 


Cpr- 


Glv 
415 


Ser 


Pro 


Glu 


Tyr 


Arg Gin 


Gin 


Ser 


Ala 


Val 


Pro 


Leu 


Asp 


Glu 


Glu 


Thr 


His 








420 








425 








430 






Ala 


Gly 


Glu 


Asp Val 


Ala 


Val 


Phe 


Ala 


Arg 


Gly 


Pro 


Gin 


Ala 


His 


Leu 






435 








440 






445 








Val 


His 


Gly 


Val Gin 


Glu 


Gin 


Thr 


Phe 


He 


Ala 


His 


Val 


Met 


Ala 


Phe 




450 






455 










460 










Ala 


Ala 


Cys 


Leu Glu 


Pro 


Tyr 


Thr 


Ala 


Cys 


Asp 


Leu 


Ala 


Pro 


Pro 


Ala 


465 








470 










475 










480 


Gly 


Thr 


Thr 


Asp Ala 
485 


Ala 


His 


Pro 


Gly 

















(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 17 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: CDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
CTGGACTCGA GNNNNNN 17 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 465 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
(v) FRAGMENT TYPE: internal 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 



Met 


Trp 


Leu 


Val 


Thr 


Phe 


Leu 


Leu 


Leu 


Leu 


Asp 


Ser 


Leu His Lys 


Ala 


1 






5 










10 




15 




Arg 


Pro 


Glu 


Asp 


Val 


Gly 


Thr 


Ser 


Leu 


Tyr 


Phe 


Val 


Asn Asp Ser 


Leu 








20 










25 








30 




Gin 


Gin 


val 


Thr 


Phe 


Ser 


Ser 


Ser 


Val 


Gly 


Val 


Val 


Val Pro Cys 


Pro 






35 










40 








45 




Ala 


Ala 


Gly 


Ser 


Pro 


Ser 


Ala 


Ala 


Leu 


Arg 


Trp 


Tyr 


Leu Ala Thr 


Gly 




50 










55 










60 




Asp Asp 


He 


Tyr 


Asp 


Val 


Pro 


His 


He 


Arg 


His 


Val His Ala Asn Gly 


65 










70 










75 






80 


Thr 


Leu 


Gin 


Leu 


Tyr 


Pro 


Phe 


Ser 


Pro 


Ser 


Ala 


Phe 


Asn Ser Phe 


He 










85 










90 






95 




His 


Asp 


Asn 


Asp 


Tyr 


Phe 


Cys 


Thr 


Ala 


Glu 


Asn 


Ala 


Ala Gly Lys 


He 








100 










105 








110 




Arg 


Ser 


Pro 


Asn 


He 


Arg 


Val 


Lys 


Ala 


val 


Phe 


Arg 


Glu Pro Tyr 


Thr 






115 










120 










125 




Val 


Arg 


Val 


Glu 


Asp 


Gin 


Arg 


Ser 


Met 


Arg 


Gly 


Asn 


Val Ala Val 


Phe 




130 










135 






140 






Lys 


Cys 


Leu 


He 


Pro 


Ser 


Ser 


Val 


Gin 


Glu 


Tyr 


Val 


Ser Val Val 


Ser 


145 










150 










155 






160 


Trp Glu 


Lys 


Asp 


Thr 


Val 


Ser 


He 


He 


Pro 


Glu 


Asn 


Arg Phe Phe 


He 










165 










170 






175 




Thr 


Tyr 


His 


Gly 


Gly 


Leu 


Tyr 


He 


Ser 


Asp 


Val 


Gin 


Lys Glu Asp Ala 








180 










185 






190 
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PCT7U 
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Leu 


ser 


Tnr 
195 


Tyr 


Arg 


Cys 


He 


Thr 
200 


Lys 


His 


Lys 


Tyr 


Ser 
205 


Gly 


Glu 


Thr 


Arg 


Gin 


Ser 


Asn 


Gly 


Ala 


Arg 


Leu 


Ser 


Val 


Thr 


Asp 


Pro 


Ala 


Glu 


Ser 


210 










215 










220 










He 


Pro 


Thr 


He 


Leu 


Asp 


Gly 


Phe 


His 


Ser 


Gin 


Glu 


Val 


Trp 


Ala 


Gly 


225 










230 










235 










240 


His 


Thr 


Val 


Glu 


Leu 


Pro 


Cys 


Thr 


Ala 


Ser 


Gly 


Tyr 


Pro 


He 


Pro 


Ala 










245 










250 








255 




He 


Arg 


Trp 


Leu 


Lys 


Asp 


Gly 


Arg 


Pro 


Leu 


Pro 


Ala 


Asp 


Ser 


Arg 


Trp 








260 










265 










270 




Thr 


Lys 


Arg 


He 


Thr 


Gly 


Leu 


Thr 


He 


Ser 


Asp 


Leu 


Arg 


Thr 


Glu 


Asp 






275 










280 










285 






Ser 


Gly 


Thr 


Tyr 


He 


Cys 


Glu 


Val 


Thr 


Asn 


Thr 


Phe 


Gly 


Ser 


Ala 


Glu 




290 










295 










300 








Ala 


Thr 


Gly 


He 


Leu 


Met 


Val 


He 


Asp 


Pro 


Leu 


His 


Val 


Thr 


Leu 


Thr 


305 








310 










315 










320 


Pro 


Lys 


Lys 


Leu 


Lys 


Thr 


Gly 


He 


Gly 


Ser 


Thr 


Val 


He 


Leu 


Ser 


Cys 










325 










330 










335 


Ala 


Leu 


Thr 


Gly 
340 


Ser 


Pro 


Glu 


Phe 


Thr 
345 


He 


Arg 


Trp 


Tyr 


Arg 
350 


Asn 


Thr 


Glu 


Leu 


Val 
355 


Leu 


Pro 


Asp 


Glu 


Ala 
360 


He 


Ser 


He 


Arg 


Gly 
365 


Leu 


Ser 


Asn 


Glu 


Thr 


Leu 


Leu 


He 


Thr 


Ser 


Ala 


Gin 


Lys 


Ser 


His 


Ser 


Gly 


Ala 


Tyr 




370 










375 










380 






Gin 


Cys 


Phe 


Ala 


Thr 


Arg 


Lys 


Ala 


Gin 


Thr 


Ala 


Gin 


Asp 


Phe 


Ala 


He 


385 










390 










395 








400 


He 


Ala 


Leu 


Glu 


Asp 
405 


Gly 


Thr 


Pro 


Arg 


He 
410 


Val 


Ser 


Ser 


Phe 


Ser 
415 


Glu 


Lys 


Val 


Val 


Asn 


Pro 


Gly 


Glu 


Gin 


Phe 


Ser 


Leu 


Met 


Cys 


Ala 


Ala 


Lys 








420 










425 








430 




Gly 


Ala 


Pro 
435 


Pro 


Pro 


Thr 


Val 


Thr 
440 


Trp 


Ala 


Leu 


Asp 


Asp 
445 


Glu 


Pro 


He 


Val 


Arg 


Asp 


Gly 


Ser 


His 


Arg 


Thr 


Asn 


Gin 


Tyr 


Thr 


Met 


Ser 


Asp 


Gly 




450 










455 










460 








Thr 
































465 

































(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1493 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA 

<ix) FEATURE: 

(A) NAME/KEY: Coding Sequence 

(B) LOCATION: 99... 1493 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:6: 

GGCACGAGGG CGGCTGGGAG CGCGCTGAGC GGGGGAGAGG CGCTGCCGCA CGGCCGGCCA 60 
CAGGACCACC TCCCCGGAGA ATAGGGCCTC TTTATGGC ATG TGG CTG GTA ACT TTC 116 

Met Trp Leu Val Thr Phe 
1 5 

CTC CTG CTC CTG GAC TCT TTA CAC AAA GCC CGC CCT GAA GAT GTT GGC 164 
Leu Leu Leu Leu Asp Ser Leu His Lys Ala Arg Pro Glu Asp Val Gly 
10 15 20 

ACC AGC CTC TAC TTT GTA AAT GAC TCC TTG CAG CAG GTG ACC TTT TCC 212 
Thr Ser Leu Tyr Phe Val Asn Asp Ser Leu Gin Gin Val Thr Phe Ser 
25 30 35 

AGC TCC GTG GGG GTG GTG GTG CCC TGC CCG GCC GCG GGC TCC CCC AGC 260 
Ser Ser Val Gly Val Val Val Pro Cys Pro Ala Ala Gly Ser Pro Ser 
40 45 50 
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GCG GCC CTT CGA TGG TAC CTG GCC ACA GGG GAC GAC ATC TAC GAC GTG 308 
Ala Ala Leu Arg Trp Tyr Leu Ala Thr Gly Asp Asp lie Tyr Asp Val 
55 ~ 60 " 65 ~ 70 

CCG CAC ATC CGG CAC GTC CAC GCC AAC GGG ACG CTG CAG CTC TAC CCC 3 56 

Pro His lie Arg His Val His Ala Asn Gly Thr Leu Gin Leu Tyr Pro 
75 80 85 

TTC TCC CCC TCC GCC TTC AAT AGC TTT ATC CAC GAC AAT GAC TAC TTC 404 
Phe Ser Pro Ser Ala Phe Asn Ser Phe lie His Asp Asn Asp Tyr Phe 
90 95 100 

TGC ACC GCG GAG AAC GCT GCC GGC AAG ATC CGG AGC CCC AAC ATC CGC 452 
Cys Thr Ala Glu Asn Ala Ala Gly Lys lie Arg Ser Pro Asn lie Arg 
105 110 115 

GTC AAA GCA GTT TTC AGG GAA CCC TAC ACC GTC CGG GTG GAG GAT CAA 500 
Val Lys Ala Val Phe Arg Glu Pro Tyr Thr Val Arg Val Glu Asp Gin 
120 125 130 

AGG TCA ATG CGT GGC AAC GTG GCC GTC TTC AAG TGC CTC ATC CCC TCT 548 
Arg Ser Met Arg Gly Asn Val Ala Val Phe Lys Cys Leu lie Pro Ser 
135 140 145 150 

TCA GTG CAG GAA TAT GTT AGC GTT GTA TCT TGG GAG AAA GAC ACA GTC 596 
Ser Val Gin Glu Tyr Val Ser Val Val Ser Trp Glu Lys Asp Thr Val 
155 160 165 

TCC ATC ATC CCA GAA AAC AGG TTT TTT ATT ACC TAC CAC GGC GGG CTG 644 
Ser lie lie Pro Glu Asn Arg Phe Phe lie Thr Tyr His Gly Gly Leu 
170 175 180 

TAC ATC TCT GAC GTA CAG AAG GAG GAC GCC CTC TCC ACC TAT CGC TGC 692 
Tyr lie Ser Asp Val Gin Lys Glu Asp Ala Leu Ser Thr Tyr Arg Cys 
185 190 195 

ATC ACC AAG CAC AAG TAT AGC GGG GAG ACC CGG CAG AGC AAT GGG GCA 740 
lie Thr Lys His Lys Tyr Ser Gly Glu Thr Arg Gin Ser Asn Gly Ala 
200 205 210 

CGC CTC TCT GTG ACA GAC CCT GCT GAG TCG ATC CCC ACC ATC CTG GAT 788 
Arg Leu Ser Val Thr Asp Pro Ala Glu Ser lie Pro Thr lie Leu Asp 
215 220 225 230 

GGC TTC CAC TCC CAG GAA GTG TGG GCC GGC CAC ACC GTG GAG CTG CCC 836 
Gly Phe His Ser Gin Glu Val Trp Ala Gly His Thr Val Glu Leu Pro 
235 240 245 

TGC ACC GCC TCG GGC TAC CCT ATC CCC GCC ATC CGC TGG CTC AAG GAT 884 
Cys Thr Ala Ser Gly Tyr Pro lie Pro Ala lie Arg Trp Leu Lys Asp 
250 * 255 " 260 

GGC CGG CCC CTC CCG GCT GAC AGC CGC TGG ACC AAG CGC ATC ACA GGG 932 
Gly Arg Pro Leu Pro Ala Asp Ser Arg Trp Thr Lys Arg lie Thr Gly 
265 270 275 

CTG ACC ATC AGC GAC TTG CGG ACC GAG GAC AGC GGC ACC TAC ATT TGT 980 
Leu Thr lie Ser Asp Leu Arg Thr Glu Asp Ser Gly Thr Tyr lie Cys 
280 285 290 

GAG GTC ACC AAC ACC TTC GGT TCG GCA GAG GCC ACA GGC ATC CTC ATG 1028 
Glu Val Thr Asn Thr Phe Gly Ser Ala Glu Ala Thr Gly lie Leu Met 
295 300 305 310 

GTC ATT GAT CCC CTT CAT GTG ACC CTG ACA CCA AAG AAG CTG AAG ACC 1076 
Val lie Asp Pro Leu His Val Thr Leu Thr Pro Lys Lys Leu Lys Thr 
s 315 320 325 
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GGC ATT GGC AGC ACG GTC ATC CTC TCC TGT GCC CTG ACG GGC TCC CCA 
Gly lie Gly Ser Thr Val lie Leu Ser Cys Ala Leu Thr Gly Ser Pro 
330 335 340 

GAG TTC ACC ATC CGC TGG TAT CGC AAC ACG GAG CTG GTG CTG CCT GAC 1172 
Glu Phe Thr lie Arg Trp Tyr Arg Asn Thr Glu Leu Val Leu Pro Asp 
345 350 355 

GAG GCC ATC TCC ATC CGT GGG CTC AGC AAC GAG ACG CTG CTC ATC ACC 1220 
Glu Ala lie Ser lie Arg Gly Leu Ser Asn Glu Thr Leu Leu lie Thr 
360 365 370 

TCG GCC CAG AAG AGC CAT TCC GGG GCC TAC CAG TGC TTC GCT ACC CGC 1268 
Ser Ala Gin Lys Ser His Ser Gly Ala Tyr Gin Cys Phe Ala Thr Arg 
375 380 385 390 

AAG GCC CAG ACC GCC CAG GAC TTT GCC ATC ATT GCA CTT GAG GAT GGC 1316 
Lys Ala Gin Thr Ala Gin Asp Phe Ala lie lie Ala Leu Glu Asp Gly 
395 400 405 

ACG CCC CGC ATC GTC TCG TCC TTC AGC GAG AAG GTG GTC AAC CCC GGG 1364 
Thr Pro Arg lie Val Ser Ser Phe Ser Glu Lys Val Val Asn Pro Gly 
410 415 420 

GAG CAG TTC TCA CTG ATG TGT GCG GCC AAG GGC GCC CCG CCC CCC ACG 1412 
Glu Gin Phe Ser Leu Met Cys Ala Ala Lys Gly Ala Pro Pro Pro Thr 
425 430 435 

GTC ACC TGG GCC CTC GAC GAT GAG CCC ATC GTG CGG GAT GGC AGC CAC 1460 
Val Thr Trp Ala Leu Asp Asp Glu Pro lie Val Arg Asp Gly Ser His 
440 445 450 

CGC ACC AAC CAG TAC ACC ATG TCG GAC GGC ACC 1493 
Arg Thr Asn Gin Tyr Thr Met Ser Asp Gly Thr 
455 460 465 

(2) INFORMATION FOR SEQ ID NO: 7: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 462 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 





(ii) MOLECULE 


TYPE 


!: protein 


















(xi) SEQUENCE 


DESCRIPTION: 


SEQ ID 


NO: 7 












Met 


Trp 


Leu 


Val 


Thr 


Phe 


Leu 


Leu 


Leu 


Leu 


Asp 


Ser 


Leu 


His 


Lys 


Ala 


1 








5 










10 










15 




Arg 


Pro 


Glu 


Asp 


Val 


Gly 


Thr 


Ser 


Leu 


Tyr 


Phe 


Val 


Asn 


Asp 


Ser 


Leu 








20 










25 










30 






Gin 


Gin 


Val 


Thr 


Phe 


Ser 


Ser 


Ser 


Val 


Gly 


Val 


Val 


Val 


Pro 


Cys 


Pro 






35 










40 










45 






Ala 


Ala 


Gly 


Ser 


Pro 


Ser 


Ala 


Ala 


Leu 


Arg 


Trp 


Tyr 


Leu 


Ala 


Thr 


Gly 




50 










55 










60 








Asp 


Asp 


lie 


Tyr 


Asp 


Val 


Pro 


His 


He 


Arg 


His 


Val 


His 


Ala 


Asn 


Gly 


65 










70 










75 










80 


Thr 


Leu 


Gin 


Leu 


Tyr 


Pro 


Phe 


Ser 


Pro 


Ser 


Ala 


Phe 


Asn 


Ser 


Phe 


He 










85 










90 










95 




His 


Asp 


Asn 


Asp 


Tyr 


Phe 


Cys 


Thr 


Ala 


Glu 


Asn 


Ala 


Ala 


Gly 


Lys 


He 








100 










105 










110 






Arg 


Ser 


Pro 


Asn 


He 


Arg 


Val 


Lys 


Ala 


Val 


Phe 


Arg 


Glu 


Pro 


Tyr 


Thr 






115 










120 










125 






Val 


Arg 


Val 


Glu 


Asp 


Gin 


Arg 


Ser 


Met 


Arg 


Gly 


Asn 


Val 


Ala 


Val 


Phe 




130 










135 










140 










Lys 


Cys 


Leu 


lie 


Pro 


Ser 


Ser 


Val 


Gin 


Glu 


Tyr 


Val 


Ser 


Val 


Val 


Ser 


145 








150 










155 










160 


Trp 


Glu 


Lys 


Asp 


Thr 


Val 


Ser 


He 


He 


Pro 


Glu 


Asn 


Arg 


Phe 


Phe 


He 








165 










170 










175 
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Thr 


Tyr 


His 


Gly 

-LOW 


Gly 


Leu 


Leu 


Ser 


Thr 
195 


Tyr 


Arg 


Cys 


Arg 


Gin 
210 


Ser 


Asn 


Gly 


Ala 


He 


Pro 


Thr 


He 


Leu 


Asp 


225 










230 


His 


Thr 


Val 


Glu 


Leu 
245 


Pro 


He 


Arg 


Trp 


Leu 
260 


Lys 


Asp 


Thr 


Lys 


Arg 


He 


Thr 


Gly 


Ser 


Gly 
z 7 w 


Thr 


Tyr 


He 


Cys 


Ala 


Thr 


Gly 


He 


Leu 


Met 


305 








310 


Pro 


Lys 


Lys 


Leu 


Lys 
325 


Thr 


Ala 


Leu 


Thr 


Gly 
340 


Ser 


Pro 


Glu 


Leu 


Val 
355 


Leu 


Pro 


Asp 


Glu 


Thr 
370 


Leu 


Leu 


He 


Thr 


Gin 


Cys 


Phe 


Ala 


Thr 


Arg 


385 










390 


He 


Ala 


Leu 


Glu 


Asp 
405 


Gly 


Lys 


Val 


Val 


Asn 
420 


Pro 


Gly 


Gly 


Ala 


Pro 


Pro 


Pro 


Thr 




435 








Val 


Arg 


Asp 


Gly 


Ser 


His 



450 
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Tyr 


He 


Ser 
185 


Asp 


Val 


Gin 


He 


Thr 
200 


Lys 


His 


Lys 


Tyr 


Arg 


Leu 


Ser 


Val 


Thr 


Asp 


215 










220 


Gly 


Phe 


His 


Ser 


Gin 
235 


Glu 


Cys 


Thr 


Ala 


Ser 
250 


Gly 


Tyr 


Gly 


Arg 


Pro 
265 


Leu 


Pro 


Ala 


Leu 


Thr 


He 


Ser 


Asp 


Leu 




280 








Glu 


Val 


Thr 


Asn 


Thr 


Phe 


295 










300 


val 


lie Asp 


Pro 


Leu 


His 










315 




Gly 


He Gly 


Ser 


Thr 


Val 








330 






Glu 


Phe 


Thr 
345 


He 


Arg 


Trp 


Glu 


Ala 


He 


Ser 


He 


Arg 




360 








Ser 


Ala 


Gin 


Lys 


Ser 


His 


375 








380 


Lys 


Ala 


Gin 


Thr 


Ala 
395 


Gin 


Thr 


Pro 


Arg 


He 
410 


Val 


Ser 


Glu 


Gin 


Phe 
425 


Ser 


Leu 


Met 


Val 


Thr 
440 


Trp 


Ala 


Leu 


Asp 


Arg 


Thr 


Asn 


Gin 


Tyr 


Thr 


455 










460 



PCT/US97/20201 





ulU 








1 go 

JL Z7 \J 






Ser 


Glv 


Glu 


Thr 


205 








Pro 


Ala 


Glu 


Ser 


Val 


Trp 


Ala 


Gly 








240 


Pro 


He 


Pro 


A3 a 

*V X d 






255 




Asp 


Ser 


Arg 


Trp 




270 






Arg 


Thr 


Glu 


Asp 


O 3 










A1 a 


VjIU 


Val 


Thr 


Leu 


Thr 








320 


He 


Leu 


Ser 








335 


xy x 


Arg 


Asn 


X l IX 




350 






Glv 


Leu 


Ser 


Asn 


365 








Ser 


Glv 


Ala 


xyir 


Asp 


Phe 


Ala 


lie 








400 


Ser 


Phe 


Ser 


Glu 






415 




Cys 


Ala 


Ala 


Lys 




430 






Asp 


Glu 


Pro 


He 


445 








Met 


Ser 







(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 605 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 



Met 


Lys 


Thr 


Pro 


Leu 


Leu 


Val 


Ser 


His 


Leu 


Leu 


Leu 


He 


Ser 


Leu 


Thr 


1 






5 










10 










15 




Ser 


Cys 


Leu 


Gly 
20 


Glu 


Phe 


Thr 


Trp 


His 
25 


Arg 


Arg 


Tyr 


Gly 


His 
30 


Gly 


Val 


Ser 


Glu 


Glu 
35 


Asp 


Lys 


Gly 


Phe 


Gly 
40 


Pro 


He 


Phe 


Glu 


Glu 
45 


Gin 


Pro 


He 


Asn 


Thr 


He 


Tyr 


Pro 


Glu 


Glu 


Ser 


Leu 


Glu 


Gly 


Lys 


Val 


Ser 


Leu 


Asn 




50 








55 










60 










Cys 


Arg 


Ala 


Arg 


Ala 


Ser 


Pro 


Phe 


Pro 


Val 


Tyr 


Lys 


Trp 


Arg 


Met 


Asn 


65 










70 










75 










80 


Asn 


Gly 


Asp 


Val 


Asp 


Leu 


Thr 


Asn 


Asp 


Arg 


Tyr 


Ser 


Met 


Val 


Gly 


Gly 










85 










90 










95 


Asn 


Leu 


Val 


He 


Asn 


Asn 


Pro 


Asp 


Lys 


Gin 


Lys 


Asp 


Ala 


Gly 


He 


Tyr 








100 










105 










110 




Tyr 


Cys 


L u 


Ala 


Ser 


Asn 


Asn 


Tyr 


Gly 


Met 


Val 


Arg 


Ser 


Thr 


Glu 


Ala 






115 










120 








125 








Thr 


Leu 


Ser 


Phe 


Gly 


Tyr 


Leu 


Asp 


Pro 


Phe 


Pro 


Pro 


Glu 


Asp 


Arg 


Pro 



130 135 140 
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Glu Val Lys Val Lys Glu Gly Lys Gly Met Val Leu Leu Cys Asp Pro 
145 150 155 160 

Pro Tyr His Phe Pro Asp Asp Leu Ser Tyr Arg Trp Leu Leu Asn Glu 

165 170 175 

Phe Pro Val Phe lie Thr Met Asp Lys Arg Arg Phe Val Ser Gin Thr 

180 185 190 

Asn Gly Asn Leu Tyr lie Ala Asn Val Glu Ser Ser Asp Arg Gly Asn 

195 200 205 

Tyr Ser Cys Phe Val Ser Ser Pro Ser lie Thr Lys Ser Val Phe Ser 

210 215 220 

Lys Phe lie Pro Leu lie Pro lie Pro Glu Arg Thr Thr Lys Pro Tyr 
225 230 235 240 

Pro Ala Asp lie Val Val Gin Phe Lys Asp lie Tyr Thr Met Met Gly 

245 250 255 

Gin Asn Val Thr Leu Glu Cys Phe Ala Leu Gly Asn Pro Val Pro Asp 

260 265 270 

lie Arg Trp Arg Lys Val Leu Glu Pro Met Pro Thr Thr Ala Glu lie 

275 280 285 

Ser Thr Ser Gly Ala Val Leu Lys lie Phe Asn lie Gin Leu Glu Asp 

290 295 300 

Glu Gly Leu Tyr Glu Cys Glu Ala Glu Asn lie Arg Gly Lys Asp Lys 
305 310 315 * " 320 

His Gin Ala Arg lie Tyr Val Gin Ala Phe Pro Glu Trp Val Glu His 

325 330 *" 335 

lie Asn Asp Thr Glu Val Asp lie Gly Ser Asp Leu Tyr Trp Pro Cys 

340 345 " 350 

Val Ala Thr Gly Lys Pro lie Pro Thr lie Arg Trp Leu Lys Asn Gly 

355 360 365 

Tyr Ala Tyr His Lys Gly Glu Leu Arg Leu Tyr Asp Val Thr Phe Glu 

370 375 380 

Asn Ala Gly Met Tyr Gin Cys lie Ala Glu Asn Ala Tyr Gly Thr lie 
385 390 395 400 

Tyr Ala Asn Ala Glu Leu Lys lie Leu Ala Leu Ala Pro Thr Phe Glu 

405 410 415 

Met Asn Pro Met Lys Lys Lys lie Leu Ala Ala Lys Gly Gly Arg Val 

420 425 430 

lie lie Glu Cys Lys Pro Lys Ala Ala Pro Lys Pro Lys Phe Ser Trp 

435 440 445 

Ser Lys Gly Thr Glu Trp Leu Val Asn Ser Ser Arg lie Leu lie Trp 

450 455 460 

Glu Asp Gly Ser Leu Glu lie Asn Asn lie Thr Arg Asn Asp Gly Gly 
465 470 475 " 480 

lie Tyr Thr Cys Phe Ala Glu Asn Asn Arg Gly Lys Ala Asn Ser Thr 

485 490 * 495 

Gly Thr Leu Val lie Thr Asn Pro Thr Arg lie lie Leu Ala Pro He 

500 505 510 

Asn Ala Asp He Thr Val Gly Glu Asn Ala Thr Met Gin Cys Ala Ala 

515 520 525 

Ser Phe Asp Pro Ser Leu Asp Leu Thr Phe Val Trp Ser Phe Asn Gly 

530 535 540 

Tyr Val He Asp Phe Asn Lys Glu He Thr Asn He His Tyr Gin Arg 
545 550 555 560 

Asn Phe Met Leu Asp Ala Asn Gly Glu Leu Leu He Arg Asn Ala Gin 

565 570 " 575 

Leu Lys His Ala Gly Arg Tyr Thr Cys Thr Ala Gin Thr He Val Asp 

580 585 590 

Asn Ser Ser Ala Ser Ala Asp Leu Val Val Arg Gly Pro 
595 600 " 605 



(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 615 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 



Met 


Trp 


Arg 


Gin 


Ser 


Thr 


He 


Leu 


Ala 


Ala 


Leu 


Leu 


Val 


Ala 


Leu 


Leu 


1 






5 










10 










15 




Cys 


Ala 


Gly 


Ser 
20 


Ala 


Glu 


Ser 


Lys 


Gly 
25 


Asn 


Arg 


Pro 


Pro 


Arg 
30 


He 


Thr 


Lys 


Gin 


Pro 
35 


Ala 


Pro 


Gly 


Glu 


Leu 
40 


Leu 


Phe 


Lys 


Val 


Ala 
45 


Gin 


Gin 


Asn 


Lys 


Glu 


Ser 


Asp 


Pro 


Glu 


Arg 


Asn 


Pro 


Phe 


He 


He 


Glu 


Cys 


Glu 


Ala 


50 










55 










60 










Asp 


Gly 


Gin 


Pro 


Glu 


Pro 


Glu 


Tyr 


Ser 


Trp 


He 


Lys 


Asn 


Gly 


Lys 


Lys 


65 










70 










75 










80 


Phe 


Asp 


Trp 


Gin 


Ala 


Tyr 


Asp 


Asn 


Arg 


Met 


Leu 


Arg 


Gin 


Pro 


Gly 


Arg 










85 










90 










95 


Gly 


Thr 


Leu 


Val 


He 


Thr 


He 


Pro 


Lys 


Asp 


Glu 


Asp 


Arg 


Gly 


His 


Tyr 








100 










105 










110 




Gin 


Cys 


Phe 


Ala 


Ser 


Asn 


Glu 


Phe 


Gly 


Thr 


Ala 


Thr 


Ser 


Asn 


Ser 


Val 




115 










120 








125 








Tyr 


Val 


Arg 


Lys 


Ala 


Glu 


Leu 


Asn 


Ala 


Phe 


Lys 


Asp 


Glu 


Ala 


Ala 


Lys 


130 








135 










140 








Thr 


Leu 


Glu 


Ala 


Val 


Glu 


Gly 


Glu 


Pro 


Phe 


Met 


Leu 


Lys 


Cys 


Ala 


Ala 


145 










150 










155 










160 


Pro 


Asp 


Gly 


Phe 


Pro 


Ser 


Pro 


Thr 


val 


Asn 


Trp 


Met 


He 


Gin 


Glu 


Ser 






165 










170 










175 




lie 


Asp 


Gly 


Ser 


He 


Lys 


Ser 


He 


Asn 


Asn 


Ser 


Arg 


Met 


Thr 


Leu 


Asp 






180 










185 










190 




Pro 


Glu 


Gly 
195 


Asn 


Leu 


Trp 


Phe 


Ser 
200 


Asn 


Val 


Thr 


Arg 


Glu 
205 


Asp 


Ala 


Ser 


Ser 


Asp 
210 


Phe 


Tyr 


Tyr 


Ala 


Cys 
215 


Ser 


Ala 


Thr 


Ser 


Val 
220 


Phe 


Arg 


Ser 


Glu 


Tyr 


Lys 


He 


Gly 


Asn 


Lys 


Val 


Leu 


Leu 


Asp 


Val 


Lys 


Gin 


Met 


Gly 


Val 


225 










230 










235 










240 


Ser 


Ala 


Ser 


Gin 


Asn 


Lys 


His 


Pro 


Pro 


Val 


Arg 


Gin 


Tyr 


Val 


Ser 


Aro 










245 








250 










255 


Arg 


Gin 


Ser 


Ala 


Leu 


Arg 


Gly 


Lys 


Arg 


Met 


Glu 


Leu 


Phe 


Cys 


He 


Tyr 






260 










265 










270 






Gly Gly 


Thr 


Pro 


Leu 


Pro 


Gin 


Thr 


Val 


Trp 


Ser 


Lys 


Asp Gly 


Gin 


Arg 






275 










280 










285 








He 


Gin 


Trp 


Ser 


Asp Arg 


He 


Thr 


Gin 


Gly 


His 


Tyr 


Gly 


Lys 


Ser 


Leu 




290 








295 










300 










Val 


He 


Arg 


Gin 


Thr 


Asn 


Phe 


Asp 


Asp 


Ala 


Gly 


Thr 


Tyr 


Thr 


Cys 


Asp 


305 










310 










315 










320 


Val 


Ser 


Asn 


Gly 


Val 


Gly 


Asn 


Ala 


Gin 


Ser 


Phe 


Ser 


He 


He 


Leu 


Asn 








325 








330 










335 




Val 


Asn 


Ser 


Val 


Pro 


Tyr 


Phe 


Thr 


Lys 


Glu 


Pro 


Glu 


He 


Ala 


Thr 


Ala 








340 








345 










350 






Ala 


Glu 


Asp 
355 


Glu 


Glu 


Val 


Val 


Phe 
360 


Glu 


Cys 


Arg 


Ala 


Ala 
365 


Gly 


Val 


Pro 


Glu 


Pro 


Lys 


lie 


Ser 


Trp 


He 


His 


Asn 


Gly 


Lys 


Pro 


lie 


Glu 


Gin 


Ser 




370 






375 








380 










Thr 


Pro 


Asn 


Pro 


Arg 


Arg 


Thr 


Val 


Thr 


Asp 


Asn 


Thr 


He 


Arg 


He 


He 


385 










390 










395 








400 


Asn 


Leu 


Val 


Lys 


Gly 
405 


Asp 


Thr 


Gly 


Asn 


Tyr 
410 


Gly 


Cys 


Asn 


Ala 


Thr 
415 


Asn 


Ser 


Leu 


Gly 


Tyr 
420 


Val 


Tyr 


Lys 


Asp 


Val 
425 


Tyr 


Leu 


Asn 


Val 


Gin 
430 


Ala 


Glu 


Pro 


Pro 


Thr 


He 


Ser 


Glu 


Ala 


Pro 


Ala 


Ala 


Val 


Ser 


Thr 


Val 


Asp 


Gly 






435 










440 










445 




Arg 


Asn 


Val 


Thr 


He 


Lys 


Cys 


Arg 


Val 


Asn 


Gly 


Ser 


Pro 


Lys 


Pro 


Leu 


450 










455 










460 










Val 


Lys 


Trp 


Leu 


Arg 


Ala 


Ser 


Asn 


Trp 


Leu 


Thr 


Gly Gly Arg 


Tyr 


Asn 


465 






470 










475 








480 


Val 


Gin 


Ala 


Asn 


Gly Asp 


Leu 


Glu 


He 


Gin 


Asp 


Val 


Thr 


Phe 


Ser 


Asp 










485 










490 










495 


Ala 


Gly 


Lys 


Tyr 


Thr 


Cys 


Tyr 


Ala 


Gin 


Asn 


Lys 


Phe Gly Glu 


He 


Gin 






500 










505 










510 






Ala 


Asp 


Gly 
515 


Ser 


Leu 


Val 


Val 


Lys 
520 


Glu 


His 


Thr 


He 


Thr 
525 


Gin 


Glu 


Pro 
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Gin 


Asn 


Tyr 


Glu 


Val 


Ala 


Ala 


Gly 


Gin 


Ser 


Ala 


Thr 


Phe 


Arg 


Cys 


Asn 




530 










535 










540 








Glu 


Ala 


His 


Asp 


Asp 


Thr 


Leu 


Glu 


He 


Glu 


He 


Asp 


Trp 


Trp 


Lys 


Asp 


545 










550 










555 










560 


Gly 


Gin 


Ser 


lie 


Asp 


Phe 


Glu 


Ala 


Gin 


Pro 


Arg 


Phe 


Val 


Lys 


Thr 


Asn 










565 










570 








575 




Asp 


Asn 


Ser 


Leu 


Thr 


lie 


Ala 


Lys 


Thr 


Met 


Glu 


Leu 


Asp 


Ser 


Gly 


Glu 








580 










585 










590 




Tyr 


Thr 


Cys 


Val 


Ala 


Arg 


Thr 


Arg 


Leu 


Asp 


Glu 


Ala 


Thr 


Ala 


Arg 


Ala 






595 










600 










605 






Asn 


Leu 
610 


lie 


Val 


Gin 


Asp 


Val 
615 





















(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 611 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 

Met Val Val Ala Leu Arg Tyr Val Trp Pro Leu Leu Leu Cys Ser Pro 

1 5 10 15 

Cys Leu Leu lie Gin He Pro Glu Glu Tyr Glu Gly His His Val Met 

20 25 30 

Glu Pro Pro Val He Thr Glu Gin Ser Pro Arg Arg Leu Val Val Phe 

35 40 45 

Pro Thr Asp Asp He Ser Leu Lys Cys Glu Ala Ser Gly Lys Pro Glu 

50 55 60 

Val Gin Phe Arg Trp Thr Arg Asp Gly Val His Phe Lys Pro Lys Glu 
65 70 75 80 

Glu Leu Gly Val Thr Val Tyr Gin Ser Pro His Ser Gly Ser Phe Thr 

85 90 95 

He Thr Gly Asn Asn Ser Asn Phe Ala Gin Arg Phe Gin Gly He Tyr 

100 105 110 

Arg Cys Phe Ala Ser Asn Lys Leu Gly Thr Ala Met Ser His Glu He 

115 120 125 

Arg Leu Met Ala Glu Gly Ala Pro Lys Trp Pro Lys Glu Thr Val Lys 

130 135 140 

Pro Val Glu Val Glu Glu Gly Glu Ser Val Val Leu Pro Cys Asn Pro 
145 150 155 160 

Pro Pro Ser Ala Glu Pro Leu Arg He Tyr Trp Met Asn Ser Lys He 

165 170 175 

Leu His He Lys Gin Asp Glu Arg Val Thr Met Gly Gin Asn Gly Asn 

180 185 190 

Leu Tyr Phe Ala Asn Val Leu Thr Ser Asp Asn His Ser Asp Tyr He 

195 200 205 

Cys His Ala His Phe Pro Gly Thr Arg Thr He He Gin Lys Glu Pro 

210 215 220 

He Asp Leu Arg Val Lys Ala Thr Asn Ser Met He Asp Arg Lys Pro 
225 230 235 240 

Arg Leu Leu Phe Pro Thr Asn Ser Ser Ser His Leu Val Ala Leu Gin 

245 250 255 

Gly Gin Pro Leu Val Leu Glu Cys He Ala Glu Gly Phe Pro Thr Pro 

260 265 270 

Thr He Lys Trp Leu Arg Pro Ser Gly Pro Met Pro Ala Asp Arg Val 

275 280 285 

Thr Tyr Gin Asn His Asn Lys Thr Leu Gin Leu Leu Lys Val Gly Glu 

290 295 300 

Glu Asp Asp Gly Glu Tyr Arg Cys Leu Ala Glu Asn Ser Leu Gly Ser 
305 310 315 320 

Ala Arg His Ala Tyr Tyr *Val Thr Val Glu Ala Ala Lys Tyr Arg lie 

325 330 335 

Gin Arg Gly Ala Leu He Leu Ser Asn Val Gin Pro Ser Asp Thr Met 
340 345 350 
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val 


Thr 


Gin 


Cys 


Glu 


Ala 






355 






Ala 


Tyr 
370 


He 


Tyr 


Val 


Val 


Asn 


Gin 


Thr 


Tyr 


Met 


Ala 


385 








390 


His 


Leu 


Tyr 


Gly 


Pro 
405 


Gly 


Gly 


Arg 


Pro 


Gin 
420 


Pro 


Glu 


Glu 


Glu 


Leu 
435 


Ala 


Lys 


Asp 


Lys 


Ala 
450 


Phe 


Gly 


Ala 


Pro 


Gly 


Thr 


Thr 


Val 


Leu 


Gin 


465 










470 


Thr 


Leu 


Gly 


He 


Arg 
485 


Asp 


Cys 


Leu 


Ala 


Ala 
500 


Asn 


Asp 


Lys 


Val 


Lys 
515 


Asp 


Ala 


Thr 


Glu 


Lys 
530 


Lys 


Gly 


Ser 


Arg 


Pro 


Ser 


Leu 


Gin 


Pro 


Ser 


545 










550 


Gin 


Glu 


Leu 


Gly 


Asp 
565 


Ser 


Val 


He 


His 


Ser 
580 


Leu 


Asp 


Ala 


Ser 


Thr 
595 


Glu 


Leu 


Asp 


Val 


Gly 


Ser 









610 
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Arg 


Asn 
360 


Arg 


His 


Gly 


Leu 


Gin 


Leu 


Pro 


Ala 


Lys 


lie 


375 








380 


Val 


Pro 


Tyr 


Trp 


Leu 
395 


His 


Glu 


Thr 


Ala 


Arg 
410 


Leu 


Asp 


Val 


Thr 


Trp 
425 


Arg 


He 


Asn 


Gin 


Gin 
440 


Gly 


Ser 


Thr 


Ala 


Val 


Pro 


Ser 


Val 


Gin 


Trp 


455 










460 


Asp 


Glu 


Arg 


Phe 


Phe 
475 


Pro 


Leu 


Gin 


Ala 


Asn 


Asp 


Thr 








490 




Gin 


Asn 


Asn 
505 


Val 


Thr 


He 


Gin 


He 


Thr 


Gin 


Gly 


Pro 




520 








Val 


Thr 


Phe 


Thr 


Cys 


Gin 


535 










540 


Tl - 

I le 


Thr 


Trp 


Arg 


Gly 
555 


Asp 


Asp 


Lys 


Tyr 


Phe 
570 


He 


Glu 


Tyr 


Ser 


Asp 
585 


Gin 


Gly 


Asn 


Val 


Val 


Glu 


Ser 


Arg 


Ala 



600 
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Leu 


Leu 


Ala 


Asn 


365 








Leu 


Thr 


Ala 


Asp 


Lys 


Pro 


Gin 


Ser 










Cys 


Gin 


Val 


Gin 






415 




Gly 


He 


Pro 


Val 




430 






Tyr 


Leu 


Leu 


Cys 


445 








Leu 


Asp 


Glu 


Asp 


Tyr 


Ala 


Asn 


Gly 










Gly 


Arg 


Tyr 


Phe 






495 




Met 


Ala 


Asn 


Leu 




510 






Arg 


Ser 


Thr 


He 


525 








Ala 


Ser 


Phe 


Asp 


Gly 


Arg 


Asp 


Leu 








560 


Asp 


Gly 


Arg 


Leu 






575 




Tyr 


Ser 


Cys 


Val 




590 






Gin 


Leu 


Leu 


Val 



605 



(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 612 amino acids 

(B) TYPE: amino acid 
(D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 



Met 


Met 


Lys 


Glu 


Lys 


Ser 


He 


Ser 


Ala 


Ser 


Lys 


Ala 


Ser 


Leu 


Val 


Phe 


1 








5 










10 










15 




Phe 


Leu 


Cys 


Gin 


Met 


He 


Ser 


Ala 


Leu 


Asp 


Val 


Pro 


Leu 


Asp 


Ser 


Lys 






20 










25 








30 




Leu 


Leu 


Glu 
35 


Glu 


Leu 


Ser 


Gin 


Pro 
40 


Pro 


Thr 


He 


Thr 


Gin 
45 


Gin 


Ser 


Pro 


Lys 


Asp 


Tyr 


He 


Val 


Asp 


Pro 


Arg 


Glu 


Asn 


He 


Val 


He 


Gin 


Cys 


Glu 




50 










55 










60 








Ala 


Lys 


Gly 


Lys 


Pro 


Pro 


Pro 


Ser 


Phe 


Ser 


Trp 


Thr 


Arg 


Asn 


Gly 


Thr 


65 










70 










75 










80 


His 


Phe 


Asp 


He 


Asp 
85 


Lys 


Asp 


Ala 


Gin 


Val 
90 


Thr 


Met 


Lys 


Pro 


Asn 
95 


Ser 


Gly 


Thr 


Leu 


Val 


Val 


Asn 


He 


Met 


Asn 


Gly 


Val 


Lys 


Ala 


Glu 


Ala 


Tyr 






100 










105 










110 




Glu 


Gly Val 


Tyr 


Gin 


Cys 


Thr 


Ala 


Arg 


Asn 


Glu 


Arg 


Gly 


Ala 


Ala 


He 






115 








120 










125 








Ser 


Asn 


Asn 


He 


Val 


He 


Arg 


Pro 


Ser 


Arg 


Ser 


Pro 


Leu 


Trp 


Thr 


Lys 




130 










135 










140 








Glu 


Lys 


Leu 


Glu 


Pro 


Asn 


His 


Val 


Arg 


Glu 


Gly 


Asp 


Ser 


Leu 


Val 


Leu 


145 








150 










155 










160 


Asn 


Cys 


Arg 


Pro 


Pro 
165 


Val 


Gly 


Leu 


Pro 


Pro 
170 


Pro 


He 


He 


Phe 


Trp 
175 


Met 
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Asp Asn Ala Phe Gin Arg Leu Pro Gin Ser Glu Arg Val Ser Gin Gly 

180 " 185 " 190 

Leu Asn Gly Asp Leu Tyr Phe Ser Asn Val Gin Pro Glu Asp Thr Arg 

195 200 . 205 

Val Asp Tyr lie Cys Tyr Ala Arg Phe Asn His Thr Gin Thr lie Gin 

210 215 220 

Gin Lys Gin Pro lie Ser Val Lys Val Phe Ser Thr Lys Pro Val Thr 
225 230 235 240 

Glu Arg Pro Pro Val Leu Leu Thr Pro Met Gly Ser Thr Ser Asn Lys 

245 250 255 

Val Glu Leu Arg Gly Asn Val Leu Leu Leu Glu Cys lie Ala Ala Gly 

260 265 270 

Leu Pro Thr Pro Val lie Arg Trp lie Lys Glu Gly Gly Glu Leu Pro 

275 280 J 285 

Ala Asn Arg Thr Phe Phe Glu Asn Phe Lys Lys Thr Leu Lys lie lie 

290 295 300 

Asp Val Ser Glu Ala Asp Ser Gly Asn Tyr Lys Cys Thr Ala Arg Asn 
305 310 315 320 

Thr Leu Gly Ser Thr His His Val lie Ser Val Thr Val Lys Ala Ala 

325 330 335 

Pro Tyr Trp lie Thr Ala Pro Arg Asn Leu Val Leu Ser Pro Gly Glu 

340 345 350 

Asp Gly Thr Leu lie Cys Arg Ala Asn Gly Asn Pro Lys Pro Ser lie 

355 360 365 

Ser Trp Leu Thr Asn Gly Val Pro lie Ala lie Ala Pro Glu Asp Pro 

370 375 380 

Ser Arg Lys Val Asp Gly Asp Thr lie lie Phe Ser Ala Val Gin Glu 
385 390 395 400 

Arg Ser Ser Ala Val Tyr Gin Cys Asn Ala Ser Asn Glu Tyr Gly Tyr 

405 410 415 

Leu Leu Ala Asn Ala Phe Val Asn Val Leu Ala Glu Pro Pro Arg lie 

420 425 430 

Leu Thr Pro Ala Asn Lys Leu Tyr Gin Val lie Ala Asp Ser Pro Ala 

435 440 445 

Leu lie Asp Cys Ala Tyr Phe Gly Ser Pro Lys Pro Glu lie Glu Trp 

450 ~ 455 460 

Phe Arg Gly Val Lys Gly Ser lie Leu Arg Gly Asn Glu Tyr Val Phe 
465 ~ 470 475 480 

His Asp Asn Gly Thr Leu Glu lie Pro Val Ala Gin Lys Asp Ser Thr 

485 490 * 495 

Gly Thr Tyr Thr Cys Val Ala Arg Asn Lys Leu Gly Lys Thr Gin Asn 

500 505 510 

Glu Val Gin Leu Glu Val Lys Asp Pro Thr Met lie lie Lys Gin Pro 

515 520 525 

Gin Tyr Lys Val lie Gin Arg Ser Ala Gin Ala Ser Phe Glu Cys Val 

530 535 540 

lie Lys His Asp Pro Thr Leu lie Pro Thr Val lie Trp Leu Lys Asp 
545 550 555 * 560 

Asn Asn Glu Leu Pro Asp Asp Glu Arg Phe Leu Val Gly Lys Asp Asn 

565 570 575 

Leu Thr lie Met Asn Val Thr Asp Lys Asp Asp Gly Thr Tyr Thr Cys 

580 585 590 

lie Val Asn Thr Thr Leu Asp Ser Val Ser Ala Ser Ala Val Leu Thr 

595 600 605 

Val Val Ala Ala 
610 



(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 607 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 



Met 


P 1 ir 

biy 


i nr 


Ala 

Aia 


J. 11 IT 


Arg 


Arg 


Lys 


Pro 


HIS 


Leu 


Leu 


Leu 


Val 


Ala 


Ala 


1 








C 










1 n 

1 VJ 










15 




17a 1 


Aia 


Leu 


vai 

on 
z u 


oer 


Ser 


Ser 


2\ 1 a 

Aia 


m 

Trp 

Z 3 


£>er 


Ser 


7V 1 a 

Ala 
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Lys 


Ser 


vai 


Phe 


Ser 


Lys 


Phe 




Z 1U 










Z 13 










Z Z VJ 








Aia 


bin 


Leu 


Asn 


Leu 


Air* 

Aia 


Ala 

Aia 


pi 11 

vlu 


Asp 


inr 


Arg 


Leu 


Phe 


Ala 


Pro 


Ser 


ZZ3 










Z JU 








235 










240 


lie 


iiys 


li 1 a 

Aia 


Arg 


pne 


Pro 


TV 1 a 

Aia 


p 1 11 
bill 


rpl_ 

± nr 


m 

Tyr 


21 1 a 

Aia 


Leu 


Val 


Gly 


Gin 


Gin 
















^ 3 VJ 








255 




vai 


mr 


Leu 


Val U 


Cys 


fne 


TV 1 a 

Aia 


rne 


P 1 *r 

biy 


Asn 


Pro 


17a 1 

vai 


Pro 


Arg 


He 


Lys 








o en 

Z DU 










Z D 3 










270 




Trp 


Arg 


Lys 

Z / 3 


vai 


Asp 


biy 


C AT* 


Leu 
0 pn 

Z OVJ 




Pro 


P 1 tn 

bin 


Trp 


Thr 
285 


Thr 


Ala 


Glu 


Fro 


1 nr 


Leu 


pt 
bin 


i ie 


Pro 


Ser 


1 1 a 1 

vai 


Ser 


p/ie 


p 1 »i 

blU 


7V AVI 

ASp 


Glu Gly Thr Tyr 




zyu 










Z 37 3 










inn 
3 uu 










blU 


Cys 


blU 


Aia 


P 1 i« 

G1U 


Asn 


Ser 


Lys 


p i *» 
biy 


TV 

Arg 


Asp 


Thr 


Val 


Gin Gly Arg 


one 
JUD 










n n 

J1U 










11c 

313 










320 


i le 


T 1 o 

i ie 


vai 


bin 


TV 1 a 

Aia 


bin 


Pro 


pin 

ulU 


Trp 


Leu 


Lys 


vai 


He 


Ser 


Asp 


Thr 










J Z 3 










3 3 w 








335 




olU 


Aia 


Asp 


i ie 


pl,» 
biy 


Ser 


Asn 


Leu 


Arg 


Trp 


biy 


Cys 


Ala 


Ala 


Ala Gly 


















3 ** 3 










350 






L»y S 


Pro 


Arg 


Pro 


i nr 


17a 1 

vai 


Arg 


Trp 


Leu 


Arg 


Asn 


p i *p 
biy 


Glu 


Pro 


Leu 


Ala 






ice 










1 An 

3 DU 








365 








Ser 


bin 


Asn 


Arg 


If a 1 

vai 


blU 


17a 1 

v ai 


Leu 


Aia 


p 1 
biy 


Asp 


Leu 


Arg 


Phe 


Ser 


Lys 




o / vj 










3/3 










380 






Leu 


Ser 


Leu 


ulu 


Asp 


Ser 


biy 




rp 

Tyr 


n i n 
bin 


Cys 


17a 1 

vai 


Ala 


Glu 


Asn 


Lys 


J O 3 










'son 










3 7 3 










400 




biy 


x nr 


T 1 0 

i ie 


Tyr 


aia 
Aia 


Cor 


Z\ 1 a 

Aia 


r* 1 11 

blU 


Leu 


Bla 

Aia 


17a 1 

vai 


Gin 


Ala 


Leu 


Ala 


















** 1 VJ 










415 




Pro 


Asp 


Phe 


Arg 


Leu 


Asn 


Pro 


17a 1 

v ai 


Arg 


Arg 


Leu 


1 ie 


Pro 


Ala 


Ala 


Arg 








420 










425 










430 




Gly 


Gly 


Glu 


He 


Leu 


He 


Pro 


Cys 


Gin 


Pro 


Arg 


Ala 


Ala 


Pro 


Lys 


Ala 






435 










440 










445 






Val 


Val 


Leu 


Trp 


Ser 


Lys 


Gly 


Thr 


Glu 


He 


Leu 


Val 


Asn 


Ser 


Ser 


Arg 




450 










455 










460 








Val 


Thr 


Val 


Thr 


Pro 


Asp 


Gly 


Thr 


Leu 


He 


He 


Arg 


Asn 


He 


Ser 


Arg 


465 










470 










475 








480 


Ser 


Asp 


Glu 


Gly 


Lys 


Tyr 


Thr 


Cys 


Phe 


Ala 


Glu 


Asn 


Phe 


Met 


Gly 


Lys 










485 










490 










495 


Ala 


Asn 


Ser 


Thr 


Gly 


He 


Leu 


Ser 


Val 


Arg 


Asp 


Ala 


Thr 


Lys 


He 


Thr 






Pro 


500 










505 










510 






Leu 


Ala 


Ser 


Ser 


Ala 


Asp 


He 


Asn 


Leu 


Gly 


Asp 


Asn 


Leu 


Thr 


Leu 



515 520 525 



BNSDOCID: <WO 9822491A1_I_> 



WO 98/22491 
























PCT7US97/20201 


















- 34 - 














Gin 


Cys 


His 


Ala 


Ser 


His 


Asp 


Pro 


Thr 


Met 


Asp 


Leu 


Thr 


Phe 


Thr 


Trp 




530 










535 










540 








Thr 


Leu 


Asp 


Asp 


Phe 


Pro 


He 


Asp 


Phe 


Asp 


Lys 


Pro 


Gly 


Gly 


His 


Tyr 


545 










550 










555 










560 


Arg 


Arg 


Thr 


Asn 


Val 
565 
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(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 596 amino acids 

(B) TYPE: amino acid 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 



Met Leu Ser Trp Lys Gin Leu He Leu Leu Ser Phe He Gly Cys Leu 

15 10 15 

Ala Gly Glu Leu Leu Leu Gin Gly Pro Val Phe Val Lys Glu Pro Ser 

20 25 * 30 

Asn Ser He Phe Pro Val Gly Ser Glu Asp Lys Lys He Thr Leu Asn 

35 40 45 

Cys Glu Ala Arg Gly Asn Pro Ser Pro His Tyr Arg Trp Gin Leu Asn 

50 55 60 

Gly Ser Asp He Asp Thr Ser Leu Asp His Arg Tyr Lys Leu Asn Gly 
65 70 75 80 

Gly Asn Leu He Val He Asn Pro Asn Arg Asn Trp Asp Thr Gly Ser 

85 90 95 

Tyr Gin Cys Phe Ala Thr Asn Ser Leu Gly Thr He Val Ser Arg Glu 

100 105 110 

Ala Lys Leu Gin Phe Ala Tyr Leu Glu Asn Phe Lys Ser Arg Met Arg 

115 120 125 

Ser Arg Val Ser Val Arg Glu Gly Gin Gly Val Val Leu Leu Cys Gly 

130 135 " ~ 140 

Pro Pro Pro His Ser Gly Glu Leu Ser Tyr Ala Trp Val Phe Asn Glu 
145 150 155 160 

Tyr Pro Ser Phe Val Glu Glu Asp Ser Arg Arg Phe Val Ser Gin Glu 

165 170 175 

Thr Gly His Leu Tyr He Ala Lys Val Glu Pro Ser Asp Val Gly Asn 

180 185 190 

Tyr Thr Cys Val Val Thr Ser Thr Val Thr Asn Ala Arg Val Leu Gly 

195 200 205 

Ser Pro Thr Pro Leu Val Leu Arg Ser Asp Gly Val Met Gly Glu Tyr 

210 215 ~ 220 

Glu Pro Lys He Glu Leu Gin Phe Pro Glu Thr Leu Pro Ala Ala Lys 
225 230 235 240 

Gly Ser Thr Val Lys Leu Glu Cys Phe Ala Leu Gly Asn Pro Val Pro 

245 250 255 

Gin He Asn Trp Arg Arg Ser Asp Gly Met Pro Phe Pro Thr Lys He 

260 265 270 

Lys Leu Arg Lys Phe Asn Gly Val Leu Glu He Pro Asn Phe Gin Gin 

275 280 285 

Glu Asp Thr Gly Ser Tyr Glu Cys He Ala Glu Asn Ser Arg Gly Lys 

290 295 * 300 

Asn Val Ala Arg Gly Arg Leu Thr Tyr Tyr Ala Lys Pro Tyr Trp Val 
305 ~ 310 315 320 

Gin Leu Leu Lys Asp Val Glu Thr Ala Val Glu Asp Ser Leu Tyr Trp 

325 330 335 

Glu Cys Arg Ala Ser Gly Lys Pro Lys Pro Ser Tyr Arg Trp Leu Lys 

340 345 350 

Asn Gly Asp Ala Leu Val Leu Glu Glu Arg He Gin He Glu Asn Gly 
355 360 365 
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(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 630 amino acids 

(B) TYPE: amino acid 
( D ) TOPOLOGY : 1 inear 



(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
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35 










40 










45 




Val 
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Ser 
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Gin Asp Ala 




195 
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Gin Thr Asp Tyr Ser Cys Asn Ala Arg Phe His Phe Thr His Thr lie 

210 215 220 

Gin Gin Lys Asn Pro Tyr Thr Leu Lys Val Lys Thr Lys Lys Pro His 
225 230 235 ** 240 

Asn Glu Thr Ser Leu Arg Asn His Thr Asp Met Tyr Ser Ala Arg Gly 

245 250 255 

Val Thr Glu Thr Thr Pro Ser Phe Met Tyr Pro Tyr Gly Thr Ser Ser 

260 265 270 

Ser Gin Met Val Leu Arg Gly Val Asp Leu Leu Leu Glu Cys lie Ala 

275 280 285 

Ser Gly Val Pro Ala Pro Asp lie Met Trp Tyr Lys Lys Gly Gly Glu 

290 295 300 

Leu Pro Ala Gly Lys Thr Lys Leu Glu Asn Phe Asn Lys Ala Leu Arg 
305 310 315 320 

lie Ser Asn Val Ser Glu Glu Asp Ser Gly Glu Tyr Phe Cys Leu Ala 

325 330 335 

Ser Asn Lys Met Gly Ser lie Arg His Thr lie Ser Val Arg Val Lys 

340 345 350 

Ala Ala Pro Tyr Trp Leu Asp Glu Pro Gin Asn Leu lie Leu Ala Pro 

355 360 365 

Gly Glu Asp Gly Arg Leu Val Cys Arg Ala Asn Gly Asn Pro Lys Pro 

370 * ~ 375 380 

Ser lie Gin Trp Leu Val Asn Gly Glu Pro lie Glu Gly Ser Pro Pro 
385 390 395 400 

Asn Pro Ser Arg Glu Val Ala Gly Asp Thr lie Val Phe Arg Asp Thr 

405 410 415 

Gin lie Gly Ser Ser Ala Val Tyr Gin Cys Asn Ala Ser Asn Glu His 

420 425 430 

Gly Tyr Leu Leu Ala Asn Ala Phe Val Ser Val Leu Asp Val Pro Pro 

435 440 445 

Arg lie Leu Ala Pro Arg Asn Gin Leu lie Lys Val lie Gin Tyr Asn 

450 455 460 

Arg Thr Arg Leu Asp Cys Pro Phe Phe Gly Ser Pro lie Pro Thr Leu 
465 470 475 480 

Arg Trp Phe Lys Asn Gly Gin Gly Asn Met Leu Asp Gly Gly Asn Tyr 

485 " 490 495 

Lys Ala His Glu Asn Gly Ser Leu Glu Met Ser Met Ala Arg Lys Glu 

500 505 510 

Asp Gin Gly lie Tyr Thr Cys Val Ala Thr Asn lie Leu Gly Lys Val 

515 520 525 

Glu Ala Gin Val Arg Leu Glu Val Lys Asp Pro Thr Arg lie Val Arg 

530 535 540 

Gly Pro Glu Asp Gin Val Val Lys Arg Gly Ser Met Pro Arg Leu His 
545 " 550 555 560 

Cys Arg Val Lys His Asp Pro Thr Leu Lys Leu Thr Val Thr Trp Leu 

565 570 575 

Lys Asp Asp Ala Pro Leu Tyr lie Gly Asn Arg Met Lys Lys Glu Asp 

580 585 590 

Asp Gly Leu Thr lie Tyr Gly Val Ala Glu Lys Asp Gin Gly Asp Tyr 

595 600 605 

Thr Cys Val Ala Ser Thr Glu Leu Asp Lys Asp Ser Ala Lys Ala Tyr 

610 615 620 

Leu Thr Val Leu Ala lie 
625 630 



BNSDOCID: <WO 9822491 A1_I_> 



WO 98/22491 



PCT/US97/20201 



- 37 - 
What is claimed is: 

1. A method for identifying a cDNA nucleic acid 
encoding a mammalian protein having a signal sequence, 
the method comprising: 
5 a) providing library of mammalian cDNA; 

b) ligating said library of mammalian cDNA to DNA 
encoding alkaline phosphatase lacking both a signal 
sequence and a membrane anchor sequence to form ligated 
DNA; 

10 c) transforming bacterial cells with said ligated 

DNA to create a bacterial cell clone library; 

d) isolating DNA comprising said mammalian cDNA 
from at least one clone in said bacterial cell clone 
library; 

15 e) separately transfecting DNA isolated from 

clones in step (d) into mammalian cells which do not 
express alkaline phosphatase to create a mammalian cell 
clone library wherein each clone in said mammalian cell 
clone library corresponds to a clone in said bacterial 

20 cell clone library; 

f) identifying a clone in said mammalian cell 
clone library which express alkaline phosphatase; 

g) identifying the clone in said bacterial cell 
clone library corresponding to said clone in said 

25 mammalian cell clone library identified in step (f ) ; and 

h) isolating and sequencing a portion of the 
mammalian cDNA present in said bacterial cell library 
clone identified in step (g) to identify a mammalian cDNA 
encoding a mammalian protein having a signal sequence. 

30 2. The method of claim 1 wherein said library of 

mammalian cDNAs are ligated to ptrAP3 . 
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3 . The method of claim 1 wherein said mammalian 
cells are COS7 cells. 

4 . The method of claim 1 wherein said bacterial 
cells are E. coli . 

5 5 . The expression vector ptrAP3 . 

6. The expression vector of claim 5, comprising 
the sequence of SEQ ID NO:l. 

7. The protein of SEQ ID NO : 5 . 

8. An isolated nucleic acid sequence encoding the 
10 amino acid sequence of SEQ ID NO: 5. 

9. A vector comprising the nucleic acid sequence 
of claim 8 . 

10. The vector of claim 9 wherein said vector is 
an expression vector. 

15 11. A genetically engineered host cell comprising 

the nucleic acid sequence of claim 5. 
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aagcttggctgtggaatgtgtgtcagttagggtgtggaaagtccccaggctccccagcaggcagaagtatgc 

aaagcatgcatctcaattagtcagcaaccaggtgtggaaagtccccaggctccccagcaggcagaagtatgc 

aaagcatgcatctcaattagtcagcaaccatagtccbgcccctaactccgcccatcccgcccctaactccgc 

ccagttccgcccattctccgccccAtggctgactaattttttttatttatgcagaggccgaggccgcctcgg 

cctctgagctattccagaagtagtgaggaggcttttttggaggcctaggcttttgcaaaaagctcctccgat 

cgaggggctcgcatctctccttcacgcgcgcgccgccctacctgaggccgccatccacgccggttgagtcgc 

gttctgccgcctcccgcctgtggtgcctcctgaactgcgtccgccgtctaggtaagtttaaagctcaggtcg 

agaccgggcctttgtccggcgctcccttggagcctacctagactcagccggctctccacgctttgcctgacc 

ctgcttgctcaactctacgtctttgtttcgttttctgttctc 

agaaagttaactggtaagtttagtctttttgtcttttatttcaggtcccaggtcccggatc 

AAGCTGCGqAATTCqCACGACCqTAqTTTTTACgCCCqqTqAqcaCTCCAgCCgCACCTACA 

AQCOgflTQTATflXTflXQflTQTXCgQCQACOXQQACCTQCTTQXQCXQQCCAACflXflCQCCT 



XXGCCXXCXCCTXQCCTXXXQCCCQTQXCXCTQCXQCXaOTQCTQCCCACQCMQCACCgT 

ccQXtoxxAXQCflCQQCCTXAAfleacaxaTCTaQTOie-rTOBoeeeAccatocAoeTaiT 
aaTxeccxxacQgcxaeaxcTaaxxaxTaTCTTaaxxxxxxTaxccoTfloxoccTaafleTa 
■aAQeeeaxoBTecacBTaeaaccxxTexxBcxaBTaaexccQaBxcTQaocBTBcxaxeea 
TOQXCBTTCxaATXceexecxeexaTxacxeTxafXTToecxefflceAexaxflaaexTBflx 

GAACCGCGAGGCAGaCGAGGCCCTGGGTGCCGCCAAGAAGCTGCAGCCTGCACAGACAGCCGCCAAGAACCT 

CATCArrrrccTcir^nAvr^.ATr^^TnTCTAntt 

GGACAAACTTGGt^nr'r^^AGATACCCCTGGCCATGGA^CGrTTCCCATATGTGGfyrCTGTr^^AAGACATAhAA 
TGTAGACAAACATfSTGCCAGACAGTGGAGCCACAGCCACt^CCTACCTGTGCGGGGTCAAG^if^AACTTCf'A 
GA(?CATTGGCTTY?AGTGFAGCCGCCCGr , TT^AAC(?AGTnCAACACGACACG^GGCAA(?GArtf!'reATCTV!CGT 

ASCCGffrACCTArerrrArAro^AArr^^ 
reAfflyOTTScrAreArAreccTACcrA^^ 
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AAAGTACAT GTTTCGCA TGGGAACCCCAGACCCTGAGTACCCAGA TGACTACAGCCAAGGTGGGACCAGGCT 
GGACGGGAAGAA TCTG GTGCAGGAA TGGCTGGCGAAGCGCCAGGGTGCCCGGTATGTGTGGAACCGCACTGA 
G CTCA TGCA GGCTTCC CTGGA CCCGTCTGTGA CCCA TCTCA TGGGTCTCTTTGAGCCTGGA G A CA TGA AA TA 
CGAGATCCACCGAGACT CCACACTGGACCCCTCCCTGATGGAGATGACAGAGGCTGCCCTGCGCCTGCTGAG 
CAGGAA CCCCCGCGGCTTCTTCCTCTTCGTGGAGGGTGGTCGCATCGACCATGGTCA TCA TGAAAGCAGGGC 
TTAGCGGGC ACTGA CTGAGACGA TC A TGTTCGACGACGCCA TTGAGAGGGCGGGCCAGGTCAC CAGCGAGGA 
GGA CA CGCTGA GCCTCGTCA CTGC CGACCA CTC CCACGTCTTCTCCTTCGGAGGCTA CCCCCTGCGAGGGAG 
C TCCA TCTTCGGGCTGGCCCCTGGCAA GGC CCGGGA CAGGAA GGCCTA CA CGGTCCTCCTA TACGGAAACGG 

GCAGCAGTCAGCAGTGCCCCTGGACGAAGAGACCCACGCAGGCGAGGACGTGGCGGTGTTCGCGCGCGGCCC 
GCAGGCGCACCTGGTTCA CGGCGTGC A GGAGCAGA CCTTCA TAGCGCA CGTCA TGGCCTTCGCCGCCTGCCT 
GGAGCCCTA CA CCGCCTGCGA CCTGGC GCCC CCC GCCGGC A CCACCGA CGCCGCGCACC CGGGTTGAACT AG 
TCTAGAGAAAAAACCTCCCACACCTCCCCCTGAACCTGAAACATAAAATGAATGCAATTGTTGTTGTTAACT 
TGTTTATTGCAGCTTATAATGGTTACAAATAAAGCAATAGCATCACAAATTTCACAAATAAAGCATTTTTTT 
CACTGCATTCTAGTTGTGGTTTGTCCAAACTCATCAATGTATCTTATCATGTCTGGATCCCCGGGTACCGAG 
CTCGAATTAATTCCTCTTCCGCTTCCTCGCTCACTGACTCGCTGCGCTCGGTCGTTCGGCTGCGGCGAGCGG 
TATCAGCTCACTCAAAGGCGGTAATACGGTTATCCACAGAATCAGGGGATAACGCAGGAAAGAACATGTGAG 
CAAAAGGCCAGCAAAAGGCCAGGAA.CCGTAAAAAGGCCGCGTTGCTGGCGTTTTTCCATAGGCTCCGCCCCC 
C TGACGAGCATC AC AAAAATCGACGCTC AAGTC AGAGGTGGCGAAAC C CGACAGGACTATAAAGATACCAGG 
CGTTTCCCCCTGGAAGCTCCCTCGTGCGCTCTCCTGTTCCGACCCTGCCGCTTACCGGATACCTGTCCGCCT 
TTCTCCCTTCGGGAAGCGTGGCGCTTTCTCAATGCTCACGCTGTAGGTATGTCAGTTCGGTGTAGGTCGTTC 
GCTCCAAGCTGGGCTGTGTGCACGAACCCCCCGTTCAGCCCGACCGCTGCGCCTTATCCGGTAACTATCGTC 
TTGAGTCCAACCCGGTAAGACACGACTTATCGCCACTGGCAGCAGCCACTGGTAACAGGATTAGCAGAGCGA 
GGTATGTAGGCGGTGCTACAGAGTTCTTGAAGTGGTGGCCTAACTACGGCTACACTAGAAGGACAGTATTTG 
GTATCTGCGCTCTGCTGAAGCCAGTTACCTTCGGAAAAAGAGTTGGTAGCTCTTGATCGGGCAAACAAACCA 
CCGCTGGTAGCGGTGGTTTTTTTGTTTGCAAGCAGCAGATTACGCGCAGAAAAAAAGGATCTCAAGAAGATC 
CTTTGATCTTTTCTACGGGGTCTGACGCTCAGTGGAACGAAAACTCACGTTAAGGGATTTTGGTCATGAGAT 
TATCAAAAAGGATCTTCACCTAGATCCTTTTAAATTAAAAATGAAGTTTTAAATCAATCTAAAGTATATATG 
AGTAAACTTGGTCTGACAGTTACCAATGCTTAATCAGTGAGGCACCTATCTCAGCGATCTGTCTATTTCGTT 
CATCCATAGTTGCCTGACTCCCCGTCGTGTAGATAACTACGATACGGGAGGGCTTACCATCTGGCCCCAGTG 
CTGCAATGATACCGCGAGACCCACGCTCACCGGCTCCAGATTTATCAGCAATAAACCAGCCAGCCGGAAGGG 
CCGAGCGCAGAAGTGGTCCTGCAACTTTATCCGCCTCCATCCAGTCTATTAATTGTTGCCGGGAAGCTAGAG 
TAAGTAGTTCGCCAGTTAATAGTTTGCGCAACGTTGTTGCCATTGCTACAGGCATCGTGGTGTCACGCTCGT 
CGTTTGGTATGGCTTCATTCAGCTCCGGTTCCCAACGATCAAGGCGAGTTACATGATCCCCCATGTTGTGCA 
AAAAAGCGGTTAGCTCCTTCGGTCCTCCGATCGTTGTCAGAAGTAAGTTGGCCGCAGTGTTATCACTCATGG 
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TTATGGCAGCACTGCATAATTCTCTTACTGTCATGCCATCCGTAAGATGCTTTTCTGTGACTGGTGAGTACT 

caaccaagtcattctgagaatagtgtatgcckk:gaccgagttgctcttgcccggcgtcaatacgggataata 
ccgcgccacatagcagaactttaaaagtgctcatcattggaaaac^ 

tcttaccgctgttgagatccagttcgatgtaacccactcgtgcacccaactgatcttcagcatcttttactt 
tcaccagcgtttctgggtgagcaaaaacaggaaggcaaaa.tgccgcaaaaaagggaataagggcgacacgga 
aatgttgaatactcatactcttcctttttcaatattattgaagcatttatcagggttattgtctcatgagcg 
gatacatatttgaatgtatttagaaaaataaacaaataggggttccgcgcacatttccccgaaaagtgccac 

CTGC CsetZ. ^ /O^jt 
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FIG. 3 

MLLLLLLLGLRLOLSLG I I PVEEENPDFWNREAAEALGAAKKLQPAQTAAKNLI 
I FLGDGMGVS TVTAARI LKGQKKDKLG P EI P LAMDRF P YVAL SKTYNVDKHVP D 
SGATATAYLCGVKGNFQTIGLSAAARFNQCNTTRGNEVISVMSTRAKKAGKSVGV 
VTTTRVQHAS PAGTYAHTVNRNWYSDADVPAS ARQEGCQDIATQLI SNMDIDVI 
LGGGRKYMFRMGTPDPEYPDDYSQGGTRLDGKNLVQEWLAKRQGARYVWNRTEL 
MQASLDPSVTHLMGLFEPGDMKYEIHRDSTLDPSLMEMTEAALRLLSRNPRGFF 
LFVEGGRIDHGHHESRAYRALTETIMFDDAIERAGQLTSEEDTLSLVTADHSHV 
FSFGGYPLRGSSIFGLAPGKARDRKAYTVLLYGNGPGYVLKDGARPDVTESESG 
S PEYRQQSAVPLDEETHAGEDVAVFARGPQAHLVHGVQEQTF IAHVMAFAACLE 

PYTACDLAPPAGTTDAAHPG RSWPALLPLLAGTLLLLETATAP 

{j>6^. it. ajo:^ 

FIG. A 

1 1 PVEEENPDFWNREAAEALGAAKKLQ PAQTAAKNL I IFLGDGMGVSTVTAARI 

LKGQKKDKLGPEIPLAMDRFPYVALSKTYNVDKHVPDSGATATAYLCGVKGNFQ 

TIGL SAAARFNQCNTTRGNEVI SVMNRAKKAGKSVGWTTTRVQHAS PAGTYAH 

TVNRNWYSDADVPASARQEGCQDIATQLISNMDIDVILGGGRKYMFRMGTPDPE 

YPDDYSQGGTRLDGKNLVQEWLAKRQGARYVWNRTELMQASLDPSVTHLMGLFE' 

PGDMKYEIHRDSTLDPSLMEMTEAALRLLSRNPRGFFLFVEGGRIDHGHHESRA 

YRALTETIMFDDAIERAGQLTSEEDTLSLVTADHSHVFSFGGYPLRGSSIFGLA 

PGKARDRKAYTVLLYGNGPGYVLKDGARPDVTESESGSPEYRQQSAVPLDEETH 

AGE D VAVF ARG P Q AHL VHGVQ E QTF I AHVMAF AAC LE P YT AC DL AP PAG TTD AA 

HPG (s<=z? dJo:i) 
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gDFAI IAZEDGTPRIVSSFS 41S 

CAG GAC TTT GCC ATC ATT GCA CTT GAG GAT GGC ACG CCC CGC ATC GTC TCG TCC TTC AGC 13 43 

EKVVNPGEQFSLMCAAKGAP 43S 

GAG AAG GTC GTC AAC CCC GGC GAG CAG TTC TCA CTG ATG TGT GCG GCC AAG GGC GCC CC3 1403 

PPTVTWALCDEP IVRDGSHR 455 

CCC CCC ACG GTC. ACC TGG GCC CTC GAC GAT GAG CCC ATC GTG CGG GAT GGC AGC CAC CGC 1463 

TNQYTMSDG? 465 

ACC AAC CAG TAC ACC ATG TCG GAC GGC ACC <3lcE >h t/PlXr) 1493 

(,Cc3 lb /up: £ j 
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8 £2 6 MWLVTFLL3LLDSLHKARPED VCTSLYPVNDSLQQVTFSSS 

03 8 4 9 2 - -KKTYLLVSKLI<LISLTSCl^SFTWHBJRYCU0V8ESDXCP^VXFSBQPINTIY?CSS^ 

P20241EURO MWRQSTIIAAI^VAI^XAOSAESKGNRFPRITX QPAPGIX-LFXVAQQHJCJBJD 

P32004EORA KVVALRYVWPUXCSPCIAIQIPEEYEGHHVHE PFVITEQ8PR-RLWFPTD 

P3 53 31o-ca -mmkeksisaskaslvfflcqmxsax^vpujsxli^^ 

QO 2 2 4 6XONX - MGTATRRXPHLLLVAAVALVSSSAWSSALGSQTT FGFVFEDQF LSVLFPKE3TE 

U11031 — — -MLSWKQLXLLSFXGCLAGELLL- Q OFVrVKBPSNSIFPVGSID 

X65224 MVIiHSHQLTYAGIAFAI^LHHLZSAIEVPLDSNIQSEXjP-QPPTITXQSVX-OYXVDP^ 



••Y 



8 C2 6 VGVVVPCPAAGSPSAALRWIATGDDIYBVPHXRHN^^ 

D38492 GKVSLNCRARASPFPVYKMRlQI-NaDVDLTN-DRYSMV GGNLVXNNFDKQK-D — A 

P202 41EURO NPFIIECEADOQPEPEYSWIKN-GKKPT3WQAYDNRM R 

P3 2 0 04EURA D - XSLKCEASGKPEVQFRWrilD-aVHFXPKEELaVTVYOS PHSaSFTXTGNNSNFAQRFQ 

P3 53 31C-CA N-IVIQCEAKGKPPPSFS«rriW-GTHFDXDXDAQVTKICra — SOTLWN I MNGVXAEAYE 

Q02246XONI EQVL LACRARAS P PATYKWKMN -GTEHXLSPGSRBQLV GGNLVXMNPTXAQ-D-- A 

Ul 1 0 3 1 KXITLNCEARGNPSPHYRWQLN-GSDIOTSLDHRYKLN GONLIVXNFNRNW-D-- T 

X65224 N- XFIECEAKGW*PVFTFSl»n , RN-GKFFNVAXDPKVSfiCRRR- - SGTLVXDFKGGGRPDDYE 

• * * * 

8 f 2 6 NDTFCTAEKAAGXIRSPNZRVKAWREPYTVRVEDQRSMR-GNVAVTKCLXPSSVQSYVS 

D38492 GXTYCLASNNYGMVR ST EAT L S F GYLDPF P P EDRPXVKVKXGKGKVt*I-CD P F YHF PDD - L 

P2 0 2 4 1 EURO GHYQCPASNEFGTATSNSVYVRXAELNAFXDEAAXTLEAVEGEFFMLKGAAPDGFPS — P 

P 3 2 0 0 4 EURA GirRCFASNKLGTAMSHEIRLMAEGAPKifFKET\nCPVTVTEaESVVXiPCNPPPfiAEP • — L 

P3 5 3 3 1G -CA GVYQCTARNSRGAAX SNNZ VZRP SRS PLWTXEXLEPKKVREODSLVLMCRPPVOLP P — P 

Q02246XONI GVTQCLASNPVOTWSREAII-RFGFLQIFSKEERDPVXAHEaVKTW — L 

U11031 GSTQCFATNSLaTXVSREAKLQFAYLEOTKflRMRSRVSVREGQGWLLOGPPPHSGS--L 

X65224 GETTQCFARtTDYGTALSSKIHl^VSRSPLWPKEKVDVZ EVDEGAPLSLQCNPPPGLPP - - P 

8 1 2 6 WSWEXDTVSI IPS NR- - FFXTYHOGLYXSDVQKED- - ALSTYRCITKHKYSOET 

D38492 SYKKLLNRFPVFITM DXRRFVSQ-TNGNliYXANVESSD RONMCFVSS" PSXT 

P20241EURO TVNMMX QES IDG S Z KS INNS R - - MTLD PEGNLWP 8NVTREDAS £ DFYTACSAT SVFRS EY 

P32004EURA RIYWHW8KILHIKQ DER - -VTHGQNGNL YF ANVLT SDN - - HSDYXCKAHF PGTRT I 

F35331G-CA IIFHKDKAFQR1.FQ SER--V8QGLNGDLYPSNVQPEDT- -RVDYICYARFNHTQTI 

Q02246XONX SYKWXXNEFFNFIPT DGRHFVSQ -TTGNLY X ARTNAS D LGNTSCLATSHMDFST 

UX 1 0 3 1 SYAUVFNEYPSFVEE DSRRF VSQ - ETGHLYX AXVEP SD VONTTCWTS - - TVTN 

X6 S 22 4 VTFWMSSSMEPXHQ DKR- -VSQCQNGDLYFSNVMLQDA— QTOVSCNARFHFTHTI 

8 f 2 6 RQSNGARLSVTDPAES 1 PT X LDGFHS QEV WAQHTVEL 

D3 8 492 KSrVFSKFIFLIPIPERTT KPYPADIWQFKOIY — TMMGQNVTL 

P2 0241EURO XIGTOKVLtDVKQMGVSASQ NKH P P VROYVSRRQS - LALRQKRMEL 

P 3 2 0 0 4EURA ZQKEPZDLRVKATNSMZD RXPRLLFPTNSS3HLVALQGQPLVL 

P3 S3 3 1G-CA QQKQP Z SVKVF STOP - • VTERPPVt*LTPMGSTSNKVELRGNVLZ*L 

Q02 2 4 6XONX KSVFSKFAQLNLAAEOTR LFAPSXKARFPAETY - -ALVGQQVTL 

U11031 ARVLGSPTPLVLRSDGVMG EYEPKXELQF PET1»F — AAXGSTVJCL 

X65224 QQXNFYTIJCNneTKKPHNETSLRNOT^ 

* * * 

8f26 PCTASGYF X P A X RWLKDGR P — LPADSRWTKRITGLTISDLRTEDSGTTICEVTNTFaSA 

D38492 ECF ALGN P VPD X RWRKVLZ P - - MPTT AEI STSGAVLK X FN X QLEDEGIi YECEAEM I RGKD 

P2 02 41EURO FCXYGGTPLPQTVWSKJKQRXQWSORXTQGHYGXSLVXRQTllFODAGTTrCOVSNGVGNA 

P3 2 0 0 4 SURA ECIABGFPTPTXKWLRPSGPM- PADRVTYQNHNKTL-QLl-KVaEEDDGETRCLAENSLGSA 

P3 5 3 3 1G-C A ECIAAGLPTPVXRMIXEGGEI*- PANRTFFENFXXTL.XX IDVSEADSGNYKCT ARNTLGST 
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Q02246XONI ECFAFGNFVFRIKWRXVDG SLSPQWTTAEPTLQ I PSVS FEDEOTTECEAENSKORD 

Ul 1 0 3 1 BCFALGOTVPQ INWRRSDOMF - FPT KIKLRXFNGVLS I PNTQQEDTGSTK 1AENSROKN 

X6 5 2 2 4 ECXASaWAPDXmYKKQQEL - P AGKTKLENTOKALRI flNVSEEDSGEYTC LASNXMQS I 
v * * » * * 

8 f 2 6 E-ATGXX^IDPLHVTLTPKiajattXGSTVILSCALTGSPEFTIR 

036492 K-HQARtWQAFPEWVEHINDTEVDIGSDLYWFCVATaJCPIPT^ — 

P20241EURO QSFSXXUWNSVPYFTKEPEXATAAEDEIKn^ECllAAOVFEPKXS^ 

P32004EURA R-HAYYVTVEAAPYWLHXPQSHLYGPOETARLDCOVQCn^^ 

P3 5331Q-CA H-HVXSVTVKAAPYWITAP!WLVLSPaE^ 

Q02246XONI T-VQGJU:rVQAQFEWLKVXSDTEADXQSNLRWGCAAAX2^ — 

U11031 V-ARCRLTYYAKPYWVQLLXDVETAVEDSLYWECRASaKPKPS - 

X6S224 R-HTXSVRVXAAPYm,DEPQNLXIAFaEDGRI,VClUUraOT^ 

# • * * * 

8f2€ S LVLFDEAISIRGLSN - 

D38492 -YAYHKaELRLYDVTPENAGKYQCIAENAYGTIYAKAELKIlJUJlPTFEMNPMXXIU^ 

P 2 0 2 4 1EURO RRTVTDNTIRI INLVXGDTGNYOCNATNS LGYVYXDVYLNVQAKPP — TISEAPAAVSTV 

P32004EURA KYRIQRGALX LSNVQP SDTMVTQCEARNRHGLI»LANAYTYWQLP A- KILTAJDNQTYMAV 

P3 5 3 3 1G-CA SRKVDGDTX X FS AVQERS S A < WQCNA5NEYGYLIJUiAFVNVl*AEPP~ RXLTPANKLYOVX 

Q02246XONI - VEVLAGOLRF SKL S LEDSGMYQCVAENKHGT I YA5 ASLAVQALAPDFRLNPVHRL ZP AA 

Ul 1 0 3 1 - IQIENGALTIANIiNVSDSQMFQCIAEinCHGLXYSSAJCLKVLASAPDFSRNPMXXMIQVQ 

X65224 SREVAGDTXWRDTQIGSSAVYQCNASNEHGYLLANA^SVI^^ 

8f 2 6 ETLLITSXQKSHSGATQCFA 

038492 KGGRVX I ECKPKAAPKPKF SWSXGTEWLVNS5RX L XWEO-GSLEXNNXTRNDGGXYTCFA 

P2 0 2 41EURO DGRNVTIKCRVNGS PKFLVKWLRASNWLT- -GGRYWVO^GDLEIQOVTFSDAGXYTCYA 

P 3 2 0 0 4 EURA QGSTAYLLCKATGAPV P S VQWXiOEDGTTVLQDERFF PYANGTLG IRPLQANDTQ RT T CL A 

P35 3 31G-CA ADSPAX-IDCAYFGSPKPEIEWrRGVXGSILRGNEYWHDNaTLEIPVAQIOrSTGTYTCVA 

Q02246XONX RGGEILXPCQPRAAPKAVVLWSKGTEILVN5SRVTVTPD-GTLIXRNXSRSDBGKYTCFA 

U11031 VGSLVILDCKPSASPRAI-SFWKXGDTVVREQARISL 

X6 5 22 4 QYNRTRIiDCPFFCS PI PTLRWF WGCGNMLDCGNYXAHENGS LEMSMARKEDQGXTTCVA 

* • • • * 

8 £ 2 6 TRKAQTAQDFAI I ALEDGTPRIVS SFSEKVVNPGEQFSLMCAAKGAP - - PFTVTHALDDE 

03 8492 ENNRGXANSTGTt-VITNPT-RXILAPINADITVGENATMQCAASFDPSUDLTFVWSFNGY 

P20241EURO QNKFGEIQADO3I*WXEHT-RXTQttPQMYEVAAC0»TFRCNEAHDDTLEXEXDWWiaX3Q 

P32004EURA ANDQNNVTXMANLKV^AT-QXTOGPRSTXEKKGSRVTFTCQASFDPSLQPSXTIffRGDGR 

P35331G-CA RinaGXTQKEVOLEVKnPT-MXIKQPQYXV^QRSAOASFEO/IKHDPTLIPTVXIflJm — 

Q02246XONI ENFMGKAM9TGXL5VRDAT-KXTLAPSSADXNLGDMLTLiQCKASHDPTMDLfTF < nFrLDDF 

Ul 1 0 3 1 ENQFGKANGTTQLWT E PT - RX I I*APSNMDVAVGES 1 1 LPCQVQHDPLL.DIMFAWYFNGT 

X65224 TNILGKVEAQVRLEVKDPT - RIVRGPEDOWXRGSMPRLKCRVKHDPTI-KLT^rrVLKD- - 



8f26 PXVRDGSHRTNQYTMS — (set* 

D3 8492 VIDFNKEITNIHYQRNFMLDANGELLIRNAQLKMAGRYTCTAQTIVDNSSASADLVVRaP C 

P20241EURO SIDFEAQPR FVKTNDN — SLTIAXTMELDSGEYTCVARTRX-DEATARAtfLXVQDV C 

P32004EURA --DLQBLGD SDKYFX2JDG- -RLVIHSL.DYSDQGNYSCVASTELDVVESRAQLLVVGS Q 

P3 5 3 3 1G-CA — NNELPDD ERFLVGXD- -NLTIMNVTDKDDGTYTC I VNTTX*DSVS ASAVX/TVVAA C_ 

Q02246XOMI PIDFDKPOG — KYRRTNVXETXGDLTILNAQLRHGKIKrrCMAQT\AroSASKEATVXVRaP C 

U11031 LT0FKXDGS--HFEKVGGSS5*GDUiXRNXQLKKSGKYVCHVpTGVDSVSSAASLXVRaS C 

X65224 --DAPLYXG MRMKKEDD--GLTIYGVAEXDQCOYTCVASTELDKDSAXAYLTVLAI C ' 
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AAA? C ATQC ATCT C AATT A.OT CSC C AACCAQQ TOT 09 AAAOTCCC CAGGCT CC C CAGOAGGC AG AAGTATGC 

AAACCftTSCATCTCAATTAQTCAGCAACCATXUOTCCbcCOCCV^lCTCCGCGCATCCCOCCCCTAACTCtCC 

CIlWSTTCCGCCCATTCTCeaCCCCATGCCT^ 

CCrcttSAGCTATTCCAGAJuQTAOTGA^ 

cgagqsgctoacatctctccttcacgc0cco5ccgocctaoctoaaoccqc 
gctctgccgcctcccocctgtottgcctcctga^ 

CTQCTTOCTCAACTCTACC ^ >CTraTOTra 

A9AXA3 TTAAC TOOT 1 AAOTTTAG TCTTTTTGT CTTTTATTTCAGGTCCC AQGTCCOGQATCC 93TQATCCAA 
ATCTAi^AACtaC^CC^AaTOAOTGTMOCTTTACTTCTW 

iseaefltfln t aAXfl&fl <** a^xc aacaxc amB*g r« * ftgT »g t »■« an r« mmi neat i»t 
caaoaxaE* vac e^goamoe ggfi umaangMaTgfl scat we ea^aate qxaaac 
iigyeiiiexe gf»x ff ge Kmneeen wtei r»T ftSAflfiAfl aMfl^nc eexe fl e-r* gflxggflg 
ggQ^aAVAxagq^flQfinT^xaffBffai.QTefaoMAe'PfPqocAccexgcaTttCxagTaAP 
goTxepcxxgrnprxgrmgTflaxxdAta^'PTaflAAA^xTsxegdtQaxaccTQgacM 
'OAaeeg^BBggeoeflTggoaeGxxTCxxaiSiaflggaeiegaBMeyiaafl^ 
■PO flxca w exax<»c eetr-rteeiq taoctxc* xflfx I * a c exeTO c excxa XflflQCXgaax 
flxexexxxcawenenaTTgcfl^xafltfCoxoATrArrpcx^^^i^Ar^AffW^ 

BNSDOCID: <WO 98224S1A1TI_> 




AAA £7T A f^A TV? TTT T GCA flV^r^VTA A f7 fTCCA GA fY** r?TYr A TA t^lHH C?A TV? A f? TA £7A GC PAAQ" TY3f^f?A g fvA fy? f"T 
fTM TiCTC A f^^-^TT fYY? TTGQA HfT TYT TOYS A P PA TV? Tf? A VGCtt TVII'T HT*7"PTY? A fa f? C3Y?fTA £13* CA A A A TA 

■^CArcr^Aryr7VTG^ 

IflfTA fvf? C TA TY^IT^TyTAA &"A£yyjttT/if?£7 CGGCPfirt A TVyyTA£ f? ft A f?A f?£? CA££fi £^Pr?Gl?A^nCfTC«gA gTA 'fg'YS 
^rAGVTAg^A^A^T^^ 

/JC'AGyy'Y^PA fZCyrCXTTtTHk rttlftt TnCAG&A fllf?A f?AfY7TTY7A T'ASiTlTi^ A^TfyTA TCSGr'f T^^^T^^Y^i'T^'^f'TH 

ra*Grci^AeArrrerrT^ 
TS^TACAa^AAAAACCTCCCACACCTCCC^^ 

TG ITT A TTGC AG C TT AT AATG GT TAG AAAT AAJ»lGC AAT AGC AT£ AC AAA TjT 1 C ACAAA 1 ? AAAGC AT T TTTT T 
C AC TG C ATTC T AGT TGT GCT TT3 TC C AAACT C AT^AATCT AT C T"T A TCATG TC TGC ATC C CC-GGG TAOCG AG 
CTCGJLATTAATTCCTCTTCC^ 

TAfC AC C TC ACT C AAAjGGCGaGTAJVTAC GGTTAT C CACAG AATC AGGGG AT AACGCAG£ AAAG AAC AT5TGJU3 

CAAAAGGCCAGCAAAAGGCCAGGAACM 

C TG AC GiAGCA TCA C AAI"JATC<^C<3CT C AAGTCA^^ 

OTTTrcCCCCTOSAAGCTCCCTCOTW 

TTCrcCCTTCCXSGAAGCETiM^ 

GCTCCAAGCTGGGCT^GTGCACGAACCCOC^^ 

TTC^GTCCAACCCGGTAAGACACGACTTATCGCC^ 

G*G T ATC T' AC G CGG TG C T A C AGAGT TC TTGAAiST G GTK C C T AAC T ACGGC T AC ACT AGAA&G AC ACTA T TTG 

GTATCTGCGCTCT^rcAAGCCAGTTACC 

CCGCTC^TAGCGCTGGTTTTTTTGTTTGCAAGC^^^ 

C TTTGATCT T*TT C T A C GGG GT C Tv3 AC G CTC AjGT G G AACG AA*"-AC T C AC G'T T AAGG GATTTTGG TC ATG AG AT 

TATCA-AAAAX3G ATCTTCACCTAC^T C CTTTTAAATTAAP<AATGiAAGT TTT AAATCAATCTAAAflTATAT ATC 

AGTAAACTTGGTCTGACAGTTACCAATC 

C*TCCAt l AGl , T©CCrcAGTCCe^^ 

CTGCAATCATAGCGCGAO&OCCACGCTC 

CCGACCGCAGAAGTGGTCCTGCAACT^ATCCGCC^ 

TAAG^AGTTCGCC^GTTAATAOTTTGCGC 

CGTTTOTTATGGCTTCATTCAGCTCM 

AAAAAGCqSTTAGCTCCTTOSQTCCTC^ 

FIG, 2 
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TTATG GC AG CACT OC ATAAfT C TCT FAC TGTCATGC C ATCOM AAGATG C WFTQ TGTQA CTGGTGaG T'AC T* 
CAAC CAAGTCATTCTGAI3AAT AGTG TA T G COTCGaCC GAOTTG CT C TTCC C CgGCCFT CAATAC GGGATAATA 
CGGC GCCACATAGC AGAACTTOAAAAG TCCTC AT C ATTG3 AAAAC G TTCTTCGGGGC GAAAACT CTC AAGGA 
^CTTACCGCTCTTGAaATCCAOTtt^ 
TCACCMCGTT^TXSGOTGAG^^ 

AATG TT GAAT^JCTCATAC TCT TC CT TTTT CAAT ATTATTCAAG C AT TTA'TCAGGG TTATTG TG TC ATC W3CG 
GATACATAVTTC AA l^T ATTTAGAAAAArAAACAA^TAGWCTTCC GCGCaC ATTTCCOC GaAaAGTG CCAC 



FIG, 2 
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PCT7US97/20MH 



MLL LLLLLGLRLOL SLG 1 1 PVISEEHFDFWNRE AAEALGAAKKLQ FA-QTAAXNL I 
I FLGDGMGVSTVTAARI LKGCKXDKLGFE I PL AMDRFP YVALSKT YNVDKHVPD 
SGATATAYLCGVKGNFQTIGLSAAARFNOCNTTRGWEVI SVMNRAKKAGKSVGV 
VTTTRVQHA S P AGT YAKTVNRNWYSDADVP AS ARQEGC QDI ATQL I SNMDI DVI 
LGGGRKYMFRMGT P D PEY P DDYSQGGTRLDGKHLVQEWL AXRQG ARYVWNRTEL 
MQ AS LDP S VTHLMGLFEFGDMKYE I HRIiSTLD P S LMEMTEAALRLL 5 RNPRGFF 
LFVEGGRI DHGHKESRAYRALTET IHPDDAI ERAGQLTSEEDTL S L VTADHSKV 
F SFGGY PLF.GSSI FGLA PGKARDRKA YTVL L YGNG FG YVLXDGARPDVTE SESG 
S PBYRQQ5 AVPLDEETHAGESVAVFARGPQ AHLVHGVQEQTF I AHVMAFAACLE 
PYTACDLAP P AGTTDAAHP Gg SWPALLPLLAflf LLLtB TATAP 



1 1 P VEEENFDFWNRE AAE ALG AAKKLQ P AQTAAKNL 1 IFLGDGMG V S TVT AARI 
LKGQKKDKLG PE I FLAWDRF PYVALSKTYl^KHVFDSGATATAYLCG VKGNFQ 
T IGLS AAARFNQCNT'rP.GNEVl S VMt^AKKAGKSVGWTTTR VQHA S PAGTYAH 
TVNRNWYSDADVPAS ARQEGCQ DIATQLl SNMDI DVILGGGRKYMFRMGT PDPE 
Y PDDYSQGGTRLDGKHLVOEWLAKRQG ARYVWNRTELMQ AS LDP SVTHLMGLFE ' 
P GDMKYETHRDSTLD P S LMEMTEAALRLLS RNPRGFFLFVEGGP.I DHGHKES RA 
YRALTET IMF DDAI EFAGQLT S E EDTTL S LVTADHSHVF SFGGY P LRG S £ IPG LA 
PGKARDRKA YTVLL YGWG PGYVLKDGARPDVTE S E SGS PEYRQQ S AVPLDEETH 
AGEDVAVFARG PQAKLVHGVQEQTFI AHVMAFAACLEP YTACDLAP PAGTTDAA 
HPG (5G5 r/i fto.-i) 
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* v a 
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CXQ CH£ GCC Wt; CTT OAT OQC CZC DSC AM CTC TCG TCC TTC 

g» w art c?rc xw: occ cjc*: gat? cag ttc tcx cm ats gcc gcc aaa -sec zee ocs hgd 

F^TUTHA^3DBP = V3l3aSKJ?: 455- 

c« aot ore jot tog ggc ctc sac ^at cm Oct? atc cm era gat osc acc cac o»c nfl* 

ACC AM CMS TAE >.CC ATC TCQ= 55^2 Att C"CeJ!> J> a^I^tJ 



PIG. 5 
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BUI 

P2M41SUJQ 

QOHt6X0tfX 

1111031 

»m4 



034493 
**«41EU*0 

pasaaic-cA 

UL1M1 
X65224 



6 £26 

M3fl04RJWl 
QM24 4XQK1 

unaai 

X«5*24 



MI4fi2 

PMD041ttXftX 
PiSJJltf-CA 
«50a24<XONt 



9£2* 
D38452 

r^osiiEURo 

PiaC04TOTLA 

uiiQai 

XSS224 



D34452 

^20341euro 

M53310-CA 



-------MflArTyi^^ M _ VaT-^YrWMtOOVTP Srtti 

"-•WWKX^VALICM wi^LrelSSSSS 

- —IfVFVMAYVWLUiCSffCillQr PMrt«HV« -PFV^MmISSvSto 

~!?E!f^ ^ ^MXABLVFPICQHZ BALOVFLOJ JtLt^IJCLfi "QP WITOQJP^-BVTTOrM 

KLBM^OLILLSFlQCUuaffLLL Q -S«TT^l3SSSS 

*KV*LHM^ wSuS^SSS 

GVlWLMKrWtWBRjtXJUlParLOB rSKXXUPV|g^Bm«DVHL»cS»>AHTi«- -Li 

qEWTMmyWAl.SaKIKl^«RSpLMPKFirWX BVDaOAPLSLQCMP t WLPP - - P 

aYiatLunvTVT 3TH- * -mowjnfiQ-uwMLYtiwvnBaro— - tunrttcnms — ps it 

P.I»«Ii>KlUJirC- --- »«*--VTM0flN5HLYT*llIVt^SB»--KSDTTCKAHF POT**! 
J22?SSSSi ? S m " - v 30OUK3PJixJ-«WOPZW- -*VDYI«MlTOm?Tl 

5VMvnrj»PsrvsE — osMnrstj-ErtOBi-riweviiPiD- ••uawrcwrs- tvto 

* ♦ * ♦ * * 

SHKKSSS!^ xwiwwiw— monro 

2S22J??i,i-" BRW ' K WWXWCPXDI* - -TOM9QMV?£< 

JSSKKESSS™ 10 (WMU^PTOMSni<VM*GOPLVL 

" — **™*n#nm*macnunNU* 
QQXwptriJwjrrwpjoitMLiij^cjr/gAi^^ 

MPALOWPVptl rfc*»KVI3P- -«PTTAKB2£GAVDK2rMTdtiD«0LYieaJ«»llU3im 
ITC. 6 
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QD224 ttff? AFBtf PVP RMMIOTDCI SLSWWTTAJttTLO ZPgVg PtflbXf r TeaCttHJWiKBIHP 

03 9492 K-BQM* Wg^FWWVEHraDTwM Q^LWroV^TOJCPIP^KWLJTO- 

?2 q % 4 iBtmo gsFatiuwiwrrrT*»BXATAAJwrewF ECTAAavpCTMaMrmraKPsxQrrrap 

P3300 4EUHA Jt-HATfTWrfUUPi^^ 

U1103 1 V- AP£XLTYYAX7YWVQ LXJCnvrf JlVJCD-I LYWttGRAMIFr P KF KfeftfrSALVU Kft — • 

ft * *• * * 

a £2 £ a * - , ^ ^ IiVLPlMXIflWCLSH ' *~ 

DIB 4*2 -YM'AKttWltV&JWlX&XI^ 

t2 02 MWim^31RlIWLVmi3TG WOCKATNStx^ - - TIAWPJULVBTV 

Pi 3 □ □ K»tgRQAL2 tSNVQf 6U TtT^TQCEWttWUlC I-Ij XAVATI YWQ L? X- XI LTAEN$T"¥HXV 

F3 5 3 3 Xff-C/L SRJCVEGEITXXF J A VQSRS S AVYQCrOASMttttYLlMiAJWYI^ * F -RULTPANRLVgVT 

g02 2 4 (iXOHI - VlfVLACDLM S^S&IPiaMYQCVAItfKHaTIYAiAi^ 

u no j i -iQiHTOArrsAMiiivaC'BOMW mlemxtowc YSSMCJcvukSAPpri jumanwrowa 

A* S3 3 * £ WWajODTI VFRDTQ ICS* AVY^TAHHlW^IJtf^^ 

B«36 — — "-BTLLITBJWK«S«WFA 

F202 4a EURO DOMJVT E JfiCfrVtfteF R^LVKWLHASWaT - -^stm^M9DLE3Qo^n^nuErfcyiv 

P« 3 3 10-CA A&6?ALIDCAY?G4PXPEIEWFK£VX» tI*ROinrYVFKDt*7TLBI FVAQKMrOTTTCVA 
CO 224 ttCdtt LXPCQP W*F>^WI^S DTTEIL™^ 

X« 5 224 Qtt4*TKLIX:FFFC« t ?TLR^^ 

* ^ * * *■ 

B f 2 € TWACTAgOPAS: I ALtCOTPRl V9 - - PFTtfWAICM 

M &4S 3 ETMMWtf STOn*VTTOPT-m3 1 UVJ'ENMETVCRtfA'tlfiWAA^'"D f P"-fi"tjDljTTVWirW2Y 

? £ D2 41BUR0 CM»SfclQADOairVWIWT - MTQBPQNV EWAACQ«TZIia3«HETCLBlttlDI«naiCSQ 
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